Home
Introduction
Categories
© 2024. All rights reserved.
Notes
Categories
Display All Posts
MachineLearning
(WIP) Expectation Maximization (EM) vs Variational Inference (VI)
(WIP) Iterative Optimization Algorithms for ML
(WIP) Regression (4/7) - Kernelization and Gaussian processes
Classification (1/4) - Logistic Regression and Optimization
Error BackPropagation
Generative vs Discriminative Models
Information Theory, Entropy and Kullback-Leibler Divergence (KLD)
L1 & L2 Regularization
MLE & Bayesian Series (1/3) - Maximum Likelihood Estimation (MLE)
MLE & Bayesian Series (2/3) - Maximum A Posteriori (MAP)
MLE & Bayesian Series (3/3) - Bayesian Approach
Neural Network (NN) and Representation
Precision, Recall and F1 Score
Principle Component Analysis (PCA) and AutoEncoder (AE)
Regression (1/7) - Linear Regression
Regression (2/7) - Bayesian Linear Regression
Regression (3/7) - Non-linear regression
Development
Docker
Useless Commands for Code Editors
Useless Github Debugging History
Deep_Generative_Model
(RE) A Long Way to Deep Generative Models - Variational AutoEncoders (VAEs)
(WIP) Rescent Advances in Deep Generative Model (1/4) - Diffusion Model
(yet) Diffusion-based Video Generation
Deep_Reinforcement_Learning
(CS285) Lecture 2 - Supervised Learning of Behaviors
(CS285) Lecture 4 - Introduction to Reinforcement Learning
(CS285) Lecture 5 - Policy Gradients
(CS285) Lecture 6 - Actor-Critic Algorithms
(CS285) Lecture 7 - Value Function Methods
(CS285) Lecture 8 - Deep RL with Q-Functions
(CS285) Lecture 9 - Advanced Policy Gradients
(WIP) (CS285) Lecture 10 - Optimal Control and Planning
(WIP) (CS285) Lecture 18 - Variational Inference and Generative Models
(WIP) (CS285) Lecture 19 - Reframing Control as an Inference Problem
(WIP) (CS285) Lecture 20 - Inverse Reinforcement Learning
(WIP) DDPG, TD3 and SAC
(WIP) Deep dive into TRPO and PPO
(WIP) Distributional RL (Categorical DQN (C51), Quantile Regression DQN (QR-DQN) and so on)
(WIP) Off-Policy RL
(WIP) Offline RL (corresponding to CS285 Lec 15 and 16)
(yet) From AlphaGo to MuZero
Before Studying RL, Why We Have to Dive into RL ?
Resources
The Graphics Processing Unit (GPU) Revolution (1) - Why GPU is important for Deep Learning?
Useful and Insightful Machine Learning Websites
Speech
(ASR) A Long Way To CTC BeamSearch (1)
DeepLearning
(WIP) Dropout and Bayesian Deep Learning
(WIP) Maximal Update (Mu) Parametrization (μP) and Hyperparameter Transfer (μTransfer)
(WIP) Rethinking Weight Decay and LLM without Bias Term
(WIP) Trivial (but critical) Training Techniques for Neural Networks
(almost) The Comparison of Positional Embedding Methods
(yet) (Paper) LongRoPE
Convolution Families
Course Overview of CSC2541 (Topics in ML - Neural Net Training Dynamics)
Deep Dive into Low Rank Adaptation (LoRA)
Large Batch Training Difficulties
Inspiration
About Processing Data
An Opinionated Guide to ML Research from John Schulman
Difference between Research Scientist vs Engineer
DxxxMind 의 인터뷰 내용은 어떨까?
How To Be Successful? from Sam Altman
The Bitter Lesson from Richard Sutton
What happened on the way to getting a job as a ML researcher
Pytorch_Implementation
(WIP) Differentiable Sampling for Discrete Distribution
CrossEntropyLoss vs NLL (feat. REINFORCE)
Gradient Clipping
Implementation of Contrastive Loss (InfoNCE Loss)
LLM-RLHF Series (6/6) - Implementation Details of PPO and RLHF
Pytorch Implementation of Variational AutoEncoders (VAEs)
REINFORCE and Actor-Critic
Training_and_Inference_Optimization
(WIP) Blockwise Ring Attention
(WIP) CheatSheet for Training Transformer (FLOPs, Time/Space Complexity)
(WIP) Distributed Training (Parallelism, ZeRO and so on)
(WIP) How Mixtral is trained efficiently?
(WIP) Sequence Generation Techniques (Beam Search and Sampling)
(almost) Pytorch Efficient Scaled Dot Product Attention (SDPA)
(yet) Resources for Torch internal (Autograds +a), CUDA, Compiler and so on
Dynamic Batching (Token Batching) for Sequence Dataset with Variable Lengths
Training DNN with Reduced Precision Floating-Point Format
RLHF
(Paper) Distributional Preference Learning (DPL)
(WIP) (Paper) Back to Basics, Revisiting REINFORCE Style Optimization for RLHF
(WIP) (Paper) ODIN, Disentangled Reward Mitigates Hacking in RLHF
(WIP) Aligning LLM with Offline RL
(yet) How to Reinforce Reseaning Capability with Verifier
(yet) Offline Actor-Critic (Perceiver Actor-Critic; PAC) Algorithm
(yet) Study on Preference based Learning