| As the source of the group study tasks of Datawhale is the Chinese version of Lee’s ML course, I am trying to make an English version notes of the course. Therefore, I watched the English version of Lee’s ML course in youtube and write down these blogs as a complement of the Chinese notes of Lee’s course. Task01 includes P1 and P2. It is roughly about the training step of ML.
  FunctionThe function needs the input of the features and outputs the result, inside which could be a significantly complex structure. Loss functionThe loss function can be defined in various forms. Our ultimate goal is always to minimize the loss function. OptimizationOptimization is the process of minimize the loss function. To minimize the loss function, the basic algorithm is gradient descent algorithm. The process can be briefly described as follows:
  w
       
      
      
       w
      
     
    w means weight, and can be thought of as the parameters in the function. L is the loss function we defined. 
    
     
      
       
        η
       
      
      
       \eta
      
     
    η is the so called learning ratio, which decides how big a step each iteration is to take.
 Gradient descent has a huge problem, that is the local minima, which means not able the achieve the best solution, namely the global minima.
 When there are more than 1 parameters, just calculate more derivatives in each iteration. It is almost the same the the condition of 1 parameter.
 |