Primary literature · systems research

Labs, platforms, vector stores & agentic IDEs—mapped to papers

Tie each landmark PDF to what teams ship: frontier labs and silicon, SageMaker / Vertex-style training planes, embedding databases behind RAG evals, and Cursor-style agents that operationalize tool-use research.

Frontier labs · alignment · silicon

Where primary literature meets today’s builders

These are the teams, publications, and stacks researchers cite when arguing about scaling, safety, hardware, and deployment—your reading map stays grounded in real shipping systems.

Google

DeepMind · Gemini · research @ scale

Gemini
DeepMind
AlphaFold-class R&D

OpenAI

Frontier APIs · reasoning evals

GPT
o-series
human preference RLHF

◈

Anthropic

Interpretability · constitutional RLHF

Claude
Constitutional AI
safety bench

Microsoft

Research · Azure AI science

Phi / Orca lineage
Copilot science
Azure AI

NVIDIA

CUDA graphs · inference · NeMo research

CUDA
Tensor Core MLPerf
NeMo

Intel

Accelerators · edge AI science

Gaudi
OpenVINO
oneAPI AI

Apple

MLX · on-device ML research

MLX
Core ML
Apple Intelligence R&D

IBM

IBM Research · enterprise AI science

watsonx.ai
Granite models
AI governance

⚙

Agent research

Tools · planners · grounded retrieval

ReAct / tool papers
RAG evals
orchestration

Platforms · benchmarks · reproducibility

The training & experiment stacks behind modern papers

Vertex AI, SageMaker, and open frameworks are how teams turn ablation studies into repeatable numbers—matching the methods sections you read (distributed training, tracking, registry, serving).

Google Cloud

Vertex AI · managed training & batch

Vertex AI
BigQuery ML
TPU pods

AWS

SageMaker · Bedrock research sandboxes

SageMaker
Bedrock
ParallelCluster

Azure

Azure ML · enterprise MLOps

Azure ML
Fabric notebooks
ONNX Runtime

Databricks

Lakehouse · MLflow lineage

MLflow
Delta Lake
Mosaic training

Docker

Reproducible envs · containerized training

CUDA images
Compose stacks
CI GPU runners

MongoDB

Atlas · vectors · AI workload data

Atlas Vector Search
RAG stores
aggregation pipelines

PyTorch

Dynamic graphs · distributed research code

torch.compile
FSDP
TorchTune

TensorFlow

Graphs · TPU / XLA pathways

Keras 3
tf.data
TF Serving

Hugging Face

Open weights · PEFT · leaderboards

Transformers
PEFT / LoRA
Open LLM LB

Weights & Biases

Experiment tracking · sweeps · model registry

W&B Sweeps
Registry
Artifacts

Vector databases · retrieval memory

Stores behind embeddings, hybrid search & RAG

ANN indexes, hybrid filters, and replication—the stacks RAG papers implicitly benchmark when they claim retrieval-augmented gains.

Pinecone

Managed vectors · namespaces · metadata filters

Serverless pods
Hybrid sparse+dense
Metadata pruning

Weaviate

GraphQL vectors · modular retrieval

HNSW / PQ
Multi-tenancy
Generative stack

Qdrant

Rust core · filtering-heavy RAG

Scalar filtering
Quantization
Distributed HA

Milvus

Open vectors · billion-scale ANN

GPU index build
Tiered storage
Attu ops UI

Chroma

Embedded UX · rapid RAG prototypes

Collections API
Persistence modes
Embedding adapters

PostgreSQL

pgvector · relational + embeddings

pgvector HNSW
Hybrid SQL+RAG
RDS / Aurora

Redis

RediSearch · low-latency vectors

Vector similarity
JSON search
Enterprise HA

Elasticsearch

dense_vector · lexical + semantic

dense_vector mapping
RRF fusion
ES|QL analytics

OpenSearch

Open lineage · k-NN serving

k-NN replication
Lexical fusion
Managed AOSS

Vespa

Ranking · tensors · hybrid serving

Tensor ranking
BM25 + ANN
Large-scale inference

Agentic IDEs · coding agents

Editors where models drive repos, tests & terminals

Cursor-style agents and Windsurf-class flows operationalize tool-use papers—multi-file edits, terminals, and PR-aware refactors beside every landmark PDF.

Cursor

Composer · codebase-wide agent edits

Composer agent
@docs grounding
Privacy tiers

Windsurf

Cascade · Codeium IDE lineage

Cascade flows
Repo-aware edits
Terminal agents

GitHub Copilot

Copilot Chat · workspace agents

Copilot Chat
CLI agent
PR summaries

VS Code

Extensions hub · Copilot host IDE

Remote SSH
Dev Containers
Copilot UX

Codeium

Autocomplete · IDE-native reasoning

Repo embeddings
Inline chat
JetBrains + VS

JetBrains

AI Assistant · Junie · Fleet

AI Assistant
Junie tasks
Kotlin/Python stacks

Replit

Agent · Ghostwriter · hosted shells

Replit Agent
Ghostwriter
Always-on runtimes

Neovim

Keyboard-first · LSP + AI plugins

Copilot.vim lineage
LSP hubs
Research scripting

Google Colab

Notebook GPUs · Gemini coding UX

TPU/GPU kernels
Gemini cells
BigQuery hooks

Gitpod

Cloud workspaces · prebuild parity

Prebuilds
Parallel sandboxes
Dockerfile CI sync

Systems

GPU compute

Parallel training & inference

Models

Architectures

Layers, attention, inductive bias

Ingest

ETL

Features

Train

Serve

Data

Pipelines & features

ETL, labeling, quality gates

Training

Loops & scaling

Optimization, eval, deployment

Deep AI ML Research Lab

Where AI research meets systems and products

Architectures, training dynamics, evaluation, and deployment—plus how today’s stacks combine RAG, tools, and agentic reasoning. Papers are anchors; the through-line is practitioner-grade AI research literacy.

Models & representation
Training & scaling
Evaluation & robustness
Systems & deployment

Frontier topics · primary literature

RAG, agents & multimodal AI

Six curated lanes where industry moved fastest—each links into our PDF viewer with reading maps and literacy notes so you engage like a practitioner, not a tourist.

RAG & grounding

Retrieve evidence first, generate second—the blueprint for factual assistants and enterprise copilots.

Knowledge APIs
Citation fidelity

Open landmark paper

Agentic reasoning

Thought traces paired with actions—inspect trajectories instead of praying for one-shot answers.

Tools
Planning loops

Open landmark paper

Models that call tools

Sparse delegation to calculators, search, and APIs—production AI stacks route exactly like this.

Toolformer lineage
Safety routing

Open landmark paper

Vision × language

Contrastive alignment unlocked zero-shot vision classifiers—upstream of diffusion conditioning.

CLIP era
Embeddings

Open landmark paper

Efficient specialization

Low-rank adapters keep frozen foundations—how teams ship vertical AI without cloning GPT-scale weights.

LoRA / PeFT
Serving

Open landmark paper

Reasoning prompts

Scratchpads before answers—minimal math for maximal gains before RL-heavy agent trainers.

Inference compute
Emergence

Open landmark paper

Reading journeys

Paths through primary AI literature

These are intentional sequences—not a generic playlist. Follow one path to build a coherent mental model: how ideas cite, critique, and replace each other in modern ML and AI systems research.

5 papers · ~10 hrs suggested

From sequence models to modern LLMs

Encoder–decoder intuition → self-attention → pre-training at scale.

4 papers · ~7 hrs suggested

Vision, depth, and generative shifts

CNN watershed moments → residual depth → diffusion fundamentals.

2 papers · ~5 hrs suggested

Learning from interaction

Deep RL from pixels → planning under uncertainty.

3 papers · ~6 hrs suggested

Agents, tools & reasoning traces

Prompted reasoning → language-conditioned actions → learned tool invocation.

Start here

Three entry points from our curated set—open any paper in the lab viewer, then follow the method & experiments thread.

Paper 1 ImageNet Classification with Deep Convolutional Neural Networks Krizhevsky, Sutskever & Hinton · NeurIPS 2012 Open in Research Lab Paper 2 Generative Adversarial Nets Goodfellow et al. · NeurIPS 2014 Open in Research Lab Paper 3 Sequence to Sequence Learning with Neural Networks Sutskever, Vinyals & Le · NeurIPS 2014 Open in Research Lab

Evidence layer · tiered

16 landmark papers — beginner → professional

Filters aren’t gates—they’re pacing guides. Move up when experiment sections feel familiar and limitations spark ideas instead of confusion.

See the home research preview

From research insight to production

Where AI research becomes products

Clear hypotheses, solid metrics, and reproducible pipelines are as important in the lab as in shipping. Below are example domains where research-grade ML meets real users, compliance, and scale.

Assembly line — research → AI products

Paper Model Train Eval Ship Product

Healthcare & life sciences

Clinical & imaging AI

Triaging, radiology assistants, and pathway support—always with audit logs, calibration checks, and human-in-the-loop review grounded in published benchmarks.