Categories

MachineLearning

(WIP) Expectation Maximization (EM) vs Variational Inference (VI)

(WIP) Iterative Optimization Algorithms for ML

(WIP) Regression (4/7) - Kernelization and Gaussian processes

Classification (1/4) - Logistic Regression and Optimization

Error BackPropagation

Generative vs Discriminative Models

Information Theory, Entropy and Kullback-Leibler Divergence (KLD)

L1 & L2 Regularization

MLE & Bayesian Series (1/3) - Maximum Likelihood Estimation (MLE)

MLE & Bayesian Series (2/3) - Maximum A Posteriori (MAP)

MLE & Bayesian Series (3/3) - Bayesian Approach

Neural Network (NN) and Representation

Precision, Recall and F1 Score

Principle Component Analysis (PCA) and AutoEncoder (AE)

Regression (1/7) - Linear Regression

Regression (2/7) - Bayesian Linear Regression

Regression (3/7) - Non-linear regression

Development

Docker

Useless Commands for Code Editors

Useless Github Debugging History

Deep_Generative_Model

(RE) A Long Way to Deep Generative Models - Variational AutoEncoders (VAEs)

(WIP) Rescent Advances in Deep Generative Model (1/4) - Diffusion Model

(yet) Diffusion-based Video Generation

Deep_Reinforcement_Learning

(CS285) Lecture 2 - Supervised Learning of Behaviors

(CS285) Lecture 4 - Introduction to Reinforcement Learning

(CS285) Lecture 5 - Policy Gradients

(CS285) Lecture 6 - Actor-Critic Algorithms

(CS285) Lecture 7 - Value Function Methods

(CS285) Lecture 8 - Deep RL with Q-Functions

(CS285) Lecture 9 - Advanced Policy Gradients

(WIP) (CS285) Lecture 10 - Optimal Control and Planning

(WIP) (CS285) Lecture 18 - Variational Inference and Generative Models

(WIP) (CS285) Lecture 19 - Reframing Control as an Inference Problem

(WIP) (CS285) Lecture 20 - Inverse Reinforcement Learning

(WIP) DDPG, TD3 and SAC

(WIP) Deep dive into TRPO and PPO

(WIP) Distributional RL (Categorical DQN (C51), Quantile Regression DQN (QR-DQN) and so on)

(WIP) Off-Policy RL

(WIP) Offline RL (corresponding to CS285 Lec 15 and 16)

(yet) From AlphaGo to MuZero

Before Studying RL, Why We Have to Dive into RL ?

Resources

The Graphics Processing Unit (GPU) Revolution (1) - Why GPU is important for Deep Learning?

Useful and Insightful Machine Learning Websites

Speech

(ASR) A Long Way To CTC BeamSearch (1)

DeepLearning

(WIP) Dropout and Bayesian Deep Learning

(WIP) Maximal Update (Mu) Parametrization (μP) and Hyperparameter Transfer (μTransfer)

(WIP) Rethinking Weight Decay and LLM without Bias Term

(WIP) Trivial (but critical) Training Techniques for Neural Networks

(almost) The Comparison of Positional Embedding Methods

(yet) (Paper) LongRoPE

Convolution Families

Course Overview of CSC2541 (Topics in ML - Neural Net Training Dynamics)

Deep Dive into Low Rank Adaptation (LoRA)

Large Batch Training Difficulties

Inspiration

About Processing Data

An Opinionated Guide to ML Research from John Schulman

Difference between Research Scientist vs Engineer

DxxxMind 의 인터뷰 내용은 어떨까?

How To Be Successful? from Sam Altman

The Bitter Lesson from Richard Sutton

What happened on the way to getting a job as a ML researcher

Pytorch_Implementation

(WIP) Differentiable Sampling for Discrete Distribution

CrossEntropyLoss vs NLL (feat. REINFORCE)

Gradient Clipping

Implementation of Contrastive Loss (InfoNCE Loss)

LLM-RLHF Series (6/6) - Implementation Details of PPO and RLHF

Pytorch Implementation of Variational AutoEncoders (VAEs)

REINFORCE and Actor-Critic

Training_and_Inference_Optimization

(WIP) Blockwise Ring Attention

(WIP) CheatSheet for Training Transformer (FLOPs, Time/Space Complexity)

(WIP) Distributed Training (Parallelism, ZeRO and so on)

(WIP) How Mixtral is trained efficiently?

(WIP) Sequence Generation Techniques (Beam Search and Sampling)

(almost) Pytorch Efficient Scaled Dot Product Attention (SDPA)

(yet) Resources for Torch internal (Autograds +a), CUDA, Compiler and so on

Dynamic Batching (Token Batching) for Sequence Dataset with Variable Lengths

Training DNN with Reduced Precision Floating-Point Format

RLHF

(Paper) Distributional Preference Learning (DPL)

(WIP) (Paper) Back to Basics, Revisiting REINFORCE Style Optimization for RLHF

(WIP) (Paper) ODIN, Disentangled Reward Mitigates Hacking in RLHF

(WIP) Aligning LLM with Offline RL

(yet) How to Reinforce Reseaning Capability with Verifier

(yet) Offline Actor-Critic (Perceiver Actor-Critic; PAC) Algorithm

(yet) Study on Preference based Learning