背景

需要从一组动作中识别这是什么动作？

理解

假设有两个投球和拍球的动作视频。我们会从每个视频中提取特征，首先是将每个视频转换为一定帧数的图像，如每个视频都以一定的频率转换为20frame的图像。以投球视频来说，将投球视频转换为20frame的图像，然后利用图像处理技术以及SVM "kernel trick"方法提取每一frame的特征值。然后将这20frame的特征值都打上“0”的标签；同样，对于拍球的动作，也这样做，然后将拍球的这20frame提取出来的特征向量打上“1”的标签。有了特征，有了标签就可以进行训练模型了。
然后现在有一个新的动作视频，将这个新的动作视频转换为20frame图片，然后提取每一个frame里面的特征，带入到训练好的模型中，得到一组标签，选择众数最多的标签作为识别的动作。
以下的代码可以自己看一下，捋一下逻辑

__author__ = 'somnath'


import numpy as np
import cv2
import sys
import os
import glob
from sklearn import svm
from scipy.stats import mode

'''
Program: Sports Action Recognition

Description:
    This program perform the sports action recognition task. First it processes the input videos from UCF sports action
    data set.The data set contains 13 different sports action which individually contains multiple videos. A video
    directory contain a video file and corresponding frames. I iterate over the different sports actions and read video
    frames from each video directory to extract features. I take equal number of videos from each categories.
    Further, to optimize the process, I sorted features with highest gradient for HOG. I have used SVM classifier and
    cross validation for classification and evaluation respectively.

    I have used provided image frame for each video as I found issue to process *.avi file in Mac.



 Feature Extraction: I have used Histogram of Oriented Gradient ( HOG ) method to extract features vector.

                     HOG:  It is constructed by dividing the image into cells and for each cell computing the
                     distribution of intensity gradients or edge directions. The concatenating each of these gradient
                     orientation histograms yields the HOG.

                     hogDescriptor = cv2.HOGDescriptor()
                     hist          = hogDescriptor.compute(gray)

                     I use above two functions to create HOG Descriptor and histogram.
                     Further, I sort the histogram values and take max 15000 values from each frame for evaluation.

 Classifier: I have used Support Vector Machine (SVM) classifier. The classifier parameters are set based on best result
     achieved from different runs. Following are the parameters that has been decided based on the multiple executions.

     Parameters:
      gamma=0.01   Lowering the gamma value gives better result, but takes more time. Optimum value has been chosen.
      C=13
      kernel_type = rbf ( default )
      degree = 3 ( default )

 Evaluation: It is based on K-Fold cross validation mechanism.
               First, I shuffle the feature list which contains features as well as label at the very first element of
               the feature vector to obtain better result. The complete set of shuffled features are divided equally
               into k=13 sub parts. k-1 subset is used for training and one subset is used for validation. I iterate the
               process for k=13 times with different subset combinations for training and validation.

               Evaluation Metrics:
                At each iteration, evaluation metrics sensitivity, specificity and accuracy are calculated
                based on True Positive (TP), False Positive (FP), False Negative (FN) and True Negative (TN) rates.

                Sensitivity = ( True Positive Rate) = TP / ( TP + FN )
                Specificity = ( True Negative Rate) = TN / ( TN + FP )
                Accuracy    = ( TP + TN ) / ( TP + FN + FP + TN )

               At the end of all iterations of cross validation, I average them all to get average rate.


 Testing: I also have tested my model to check if that works with unseen data or videos.
          For that, I have taken one video from "Diving-Side/014" which has been correctly predicted by my model.
          Result is given below.
'''

sportsActionPath = "D:\\gitWorkingRepos\\Sports-Action-Recognition-master\\Train\\"
#sportsActionPath = "/Users/somnath/MY_PROG/ComputerVision/pa3/Training"


# Sports Action Tag
sportsActionTag = {
    'VolleyballSpiking': 0,
    'WallPushups':1,
    'Golf-Swing-Front':2,
    'Golf-Swing-Side':3,
    'Kicking-Front':4,
    'Kicking-Side':5,
    'Lifting':6,
    'Run-Side':7,
    'SkateBoarding-Front':8,
    'Swing-SideAngle':9,
    'Walk-Front':10,
    'Swing-Bench':11,
    'Riding-Horse':12
}



# Distinct Sports Action Number
sportsActionNumber = len(sportsActionTag)

featuresLimit = 15000


'''
Function Name: featureExtraction()
Input Args   : <Sports Action Path>, <Action name>, <Training/ Validation>,
Returns      : <Array: Feature List>
Description  : This function extract features from each frames of a video and consolidated them.
               While it extract features, it add label to feature at the beginning of feature vector based on Sports
               Action Type. It helps to keep tack of feature and corresponding label while shuffle the features during
               cross validation.

               - I have used histogram of oriented gradient (HOG) method to extract the features.
                 Following methods from cv2 have been used.
                 hogDescriptor = cv2.HOGDescriptor()
                  - It takes default parameter values as Window Size= 64 x 128, block size= 16x16,
                    block stride= 8x8, cell size= 8x8, bins= 9
                 hist = hogDescriptor.compute(gray)
                  - Returns the list of histogram

               - Sorted the Histogram and taken top 15000 for evaluation.
               - I take equal number of image frame from all the videos.
'''
def featureExtraction( videoPath, actionName, type):


    # Set frame path, if jpeg directory doesn't exist , take images from video dir
    framePath = videoPath
    if os.path.exists( framePath + "/jpeg") :
        framePath += "/jpeg/"

    # Extract feature
    imageFrames = getListOfDir(framePath)
    #print "DEBUG: Image Frames - ", imageFrames
    frameCount = 0
    frameIndex = 0

    # Feature List for a video
    videoFeatures  = []

    for iFrame in imageFrames:

        frameIndex += 1
        iFrame = framePath+"\\"+iFrame
        # Read Frame
        frame = cv2.imread(iFrame)
        gray=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)

        # HOG Descriptor , default value it takes window size= 64x128, block size= 16x16, block stride= 8x8, cell size= 8x8, bins= 9
        hogDescriptor = cv2.HOGDescriptor()

        # Returns histogram
        hist = hogDescriptor.compute(gray)

        #sortedHogDescriptor = hogDescriptor
        sortedHogHist = np.sort(hist, axis=None)

        keyFeatures = sortedHogHist[- featuresLimit : ]

        if type == "Trng":
            keyFeatures = np.insert(keyFeatures, 0, sportsActionTag[actionName])

        videoFeatures.append(keyFeatures)

        # Lowest number of frame available in a video
        if frameCount >= 3:
            break

        frameCount += 1


    return videoFeatures


'''
Function Name: getImageList()
Input Args   : <Image Directory>
Return       : <Array:List of Images>
Description  : This function returns list of images.
'''
def getImageList(imageDirectory):

    # Find different type of images
    rImages = glob.glob(imageDirectory + "/*.jpg")
    rImages +=  glob.glob(imageDirectory + "/*.jpeg")
    rImages +=  glob.glob(imageDirectory + "/*.png")

    return rImages


'''
Function Name: getListOfDir()
Input Args   : < Path >
Return       : <Array: List of Directory >
Description  : This function returns all the directories under the specified paths
'''
def getListOfDir(path):
    # Read each sport action directory
    dirs  = os.listdir(path)

    sportsActionsCount = 0
    filtered_dir  = []
    # Remove . .. and hidden directory
    for dir in dirs:
        if not dir.startswith("."):
            filtered_dir.append(dir)

    return filtered_dir

'''
Function Name: getSportsActionName()
Input Args   : < Sports Action Index>
Return       : <Sports Action Name>
Description  : This function returns the name of Sports Action based on index value

'''
def getSportsActionName(saIndex):

    keys   = sportsActionTag.keys()

    for key in keys:
        if saIndex == sportsActionTag[key]:
            return key

'''
Function Name: evaluation()
Input Args   : < 1D Array: Truth>, <1D Array: Predicted>, < Sports Action Index>
Return       : <Accuracy>,<Sensitivity>,<Specificity>
Description  :  This function calculate evaluation metrics sensitivity, specificity and accuracy
               based on True Positive (TP), False Positive (FP), False Negative (FN) and True Negative (TN) rate.

               Sensitivity = ( True Positive Rate) = TP / ( TP + FN )
               Specificity = ( True Negative Rate) = TN / ( TN + FP )
               Accuracy    = ( TP + TN ) / ( TP + FN + FP + TN )

'''

def evaluation( truth, predicted, categoryIndex ):

    # TP,FP,FN,TN indicate True Positive, False Positive, False Negative, True Negative respectively
    TP = 1
    FP = 1
    FN = 1
    TN = 1

    # Categories are Sports Action 1=>0, Sports Action 2=> 1, Sports Action 3=>2  etc..
    for fIndex in range(len(truth)):

         # Positive prediction for each feature
        if ( int(predicted[fIndex]) == categoryIndex):
            # TP=> when P[i] = T[i] = Ci
            if (int(truth[fIndex]) == int (predicted[fIndex])):
                TP += 1
            else:
                FP += 1
        else: # Negative Prediction
            if ( int ( truth[fIndex]) == categoryIndex ):
                FN += 1
            else:
                TN += 1


    # Calculate Sensitivity - True Positive Rate - Recall
    sensitivity = TP / float ( TP + FN )

    # Specificity - True Negative Rate
    specificity = TN / float ( TN + FP )

    #Calculate accuracy
    accuracy =  ( TP + TN ) / float ( TP + FP + FN + TN )


    return sensitivity, specificity, accuracy

'''
Function Name: crossValidation()
Input Args   : < Array: Feature and Label List - Fits element of vector indicates action label and rest are for features>
Retrun       : None
Description  : It perform K-Fold cross validation.
               First, I shuffle the feature list which contains features as well as label at the very first element of
               the feature vector to obtain better result. The complete set of shuffled features are divided equally
               into k=13 sub parts. k-1 subset is used for training and one subset is used for validation. I iterate the
               process for k=13 times with different subset combinations for training and validation.

               Evaluation Metrics:
                At each iteration, evaluation metrics sensitivity, specificity and accuracy are calculated
                based on True Positive (TP), False Positive (FP), False Negative (FN) and True Negative (TN) rates.

                Sensitivity = ( True Positive Rate) = TP / ( TP + FN )
                Specificity = ( True Negative Rate) = TN / ( TN + FP )
                Accuracy    = ( TP + TN ) / ( TP + FN + FP + TN )

               At the end of all iterations of cross validation, I average them all to get average rate.
'''
def crossValidation( featureAndLabelList):

    # Randomize the sample
    np.random.shuffle(featureAndLabelList)


    # Evaluation Metrics
    sensitivity = 0.0
    specificity = 0.0
    accuracy    = 0.0


    # split feature set in equal subsets same as number of sports actions for cross validation
    subsetLength =  len(featureAndLabelList) / sportsActionNumber
    for rIndex in range(sportsActionNumber):

        print ("INFO: Cross Validation Iteration - ", rIndex)
        trainigSet   = []
        valdationSet = []
        feature = []
        label   = []


        if ( rIndex == 0 ):
            trainigSet = featureAndLabelList[1*subsetLength:]
            valdationSet = featureAndLabelList[0: subsetLength]
        elif ( rIndex == (sportsActionNumber -1) ):
            trainigSet = featureAndLabelList[:(sportsActionNumber -1)*subsetLength]
            valdationSet = featureAndLabelList[(sportsActionNumber -1)*subsetLength : ]
        else:
            trainigSet = np.concatenate ((featureAndLabelList[:rIndex * subsetLength] , featureAndLabelList[(rIndex + 1) * subsetLength: ]), axis=0 )
            valdationSet = featureAndLabelList[rIndex * subsetLength : (rIndex + 1 ) * subsetLength]

        # Get all features in a array
        for featureAndLabel in trainigSet:
            label.append(int(featureAndLabel[0]))
            feature.append((np.delete(featureAndLabel, 0)).tolist())


        # Train model
        print ("INFO: Training ... ")
        clf = svm.SVC(gamma=0.01, C=13)
        clf.fit(feature,label)

        # Prepare validation feature and label to be predicted
        print ("INFO: Prediction for ", getSportsActionName(rIndex))
        vFeatureList = []
        vLabelList   = [] # Ground Truth
        for featureAndLabel in valdationSet:
            vFeatureList.append(featureAndLabel[1:].tolist())
            vLabelList.append(featureAndLabel[0])

        # Predict the class label for Validation Feature List
        predictedLabel = clf.predict(vFeatureList)

        # predict validation set and calculate accuracy
        print ("INFO: Evaluating ... ")
        #print "\t Truth - ", vLabelList
        #print "\t Predicted - ", str(predictedLabel.tolist())

        # Evaluation < Truth>, <Predicted>, <Sports Action Index>
        (sen, spec , accu ) = evaluation(vLabelList , predictedLabel.tolist() , rIndex)

        sensitivity += sen
        specificity += spec
        accuracy    += accu

        print ("\t   Sensitivity : ", sen)
        print ("\t   Specificity : ", spec)
        print ("\t   Accuracy    : ", accu)


    # Average evaluation metrics
    avgSensitivity = sensitivity / sportsActionNumber
    avgSpecificity = specificity / sportsActionNumber
    avgAccuracy = accuracy / sportsActionNumber


    print ("  *** Overall Evaluation ***")
    print ("    Average Sensitivity: ", avgSensitivity)
    print ("    Average Specificity: ", avgSpecificity)
    print ("    Average Accuracy   : ", avgAccuracy)



def main():
    print ("INFO: Action Recognition")

    sportsActionList = getListOfDir( sportsActionPath )
    print ("INFO: Sports Action - ",sportsActionList)

    sportsActionFeatures = []

    firstActionFlag = 0
    for sportsActionName in sportsActionList:
        sportsActionDir = sportsActionPath + sportsActionName
        # Get list of videos from each sports action
        videoList = getListOfDir(sportsActionDir)

        print ("INFO: Video List:", videoList)

        videoCount = 1
        videoFeatures = []
        # For all video in each action category
        for video  in videoList:

            # For good result decided to use same number of videos from Action Sports. And same number of frame from each frame
            if videoCount > 2:
                break
            # complete path of video containing jpeg images
            videoPath = sportsActionDir + "\\" + video
            #print ("\tVideo Path:", videoPath)

            # Extract Features
            videoFeatures = featureExtraction(videoPath , sportsActionName, 'Trng')

            # Put together all the videos
            if firstActionFlag == 0:
                sportsActionFeatures = videoFeatures
                firstActionFlag = 1
            else:
                sportsActionFeatures = np.concatenate( (sportsActionFeatures, videoFeatures), axis=0)

            videoCount += 1

    ## K-Fold Cross Validation method
    #crossValidation(sportsActionFeatures)

    ## ****  Testing with unseen data **** ##

    np.random.shuffle(sportsActionFeatures)
    label = []
    feature = []
    # Get all features in a array
    for featureAndLabel in sportsActionFeatures:
        label.append(int(featureAndLabel[0]))
        feature.append((np.delete(featureAndLabel, 0)).tolist())



    # Train model
    print ("INFO: Training ... ")
    clf = svm.SVC(gamma=0.01, C=13)
    clf.fit(feature,label)

    # Test Path
    tPath = "D:\\gitWorkingRepos\\Sports-Action-Recognition-master\\Test\\"
    vFeatures = featureExtraction(tPath , sportsActionName, 'Test')
    predictedLabels = clf.predict(vFeatures)

    #print "Predicted Labels:", predictedLabels
    predictedLabelMode = (mode(predictedLabels))[0]
    print ("\t Predicted Sports Action:{0} - {1}".format(predictedLabelMode,getSportsActionName(predictedLabelMode) ))


if __name__ == "__main__":
    main()




'''
RESULT:
INFO: Cross Validation Iteration -  0
INFO: Training ...
INFO: Prediction for  Diving-Side
INFO: Evaluating ...
	   Sensitivity :  0.692307692308
	   Specificity :  0.963636363636
	   Accuracy    :  0.934959349593
INFO: Cross Validation Iteration -  1
INFO: Training ...
INFO: Prediction for  Golf-Swing-Back
INFO: Evaluating ...
	   Sensitivity :  0.272727272727
	   Specificity :  0.910714285714
	   Accuracy    :  0.853658536585
INFO: Cross Validation Iteration -  2
INFO: Training ...
INFO: Prediction for  Golf-Swing-Front
INFO: Evaluating ...
	   Sensitivity :  0.5
	   Specificity :  0.965811965812
	   Accuracy    :  0.943089430894
INFO: Cross Validation Iteration -  3
INFO: Training ...
INFO: Prediction for  Golf-Swing-Side
INFO: Evaluating ...
	   Sensitivity :  0.9
	   Specificity :  0.946902654867
	   Accuracy    :  0.943089430894
INFO: Cross Validation Iteration -  4
INFO: Training ...
INFO: Prediction for  Kicking-Front
INFO: Evaluating ...
	   Sensitivity :  0.2
	   Specificity :  0.982300884956
	   Accuracy    :  0.918699186992
INFO: Cross Validation Iteration -  5
INFO: Training ...
INFO: Prediction for  Kicking-Side
INFO: Evaluating ...
	   Sensitivity :  0.1
	   Specificity :  0.982300884956
	   Accuracy    :  0.910569105691
INFO: Cross Validation Iteration -  6
INFO: Training ...
INFO: Prediction for  Lifting
INFO: Evaluating ...
	   Sensitivity :  0.888888888889
	   Specificity :  0.973684210526
	   Accuracy    :  0.967479674797
INFO: Cross Validation Iteration -  7
INFO: Training ...
INFO: Prediction for  Run-Side
INFO: Evaluating ...
	   Sensitivity :  0.583333333333
	   Specificity :  0.90990990991
	   Accuracy    :  0.878048780488
INFO: Cross Validation Iteration -  8
INFO: Training ...
INFO: Prediction for  SkateBoarding-Front
INFO: Evaluating ...
	   Sensitivity :  0.3
	   Specificity :  0.955752212389
	   Accuracy    :  0.90243902439
INFO: Cross Validation Iteration -  9
INFO: Training ...
INFO: Prediction for  Swing-SideAngle
INFO: Evaluating ...
	   Sensitivity :  0.46511627907
	   Specificity :  0.934090909091
	   Accuracy    :  0.892339544513
INFO: Cross Validation Iteration -  10
INFO: Training ...
INFO: Prediction for  Walk-Front
INFO: Evaluating ...
	   Sensitivity :  0.363636363636
	   Specificity :  0.955357142857
	   Accuracy    :  0.90243902439
INFO: Cross Validation Iteration -  11
INFO: Training ...
INFO: Prediction for  Swing-Bench
INFO: Evaluating ...
	   Sensitivity :  0.8
	   Specificity :  0.940677966102
	   Accuracy    :  0.934959349593
INFO: Cross Validation Iteration -  12
INFO: Training ...
INFO: Prediction for  Riding-Horse
INFO: Evaluating ...
	   Sensitivity :  0.9
	   Specificity :  0.902654867257
	   Accuracy    :  0.90243902439
  *** Overall Evaluation ***
    Average Sensitivity:  0.535846909997
    Average Specificity:  0.947984173698
    Average Accuracy   :  0.914169958709


### Testing with unseen data or video which has not been used for training
Test Video: /Users/somnath/MY_PROG/ComputerVision/PA3/ucf_sports_actions/ucf_action/Diving-Side/014

INFO: Training ...
	 Predicted Sports Action:[0] - Diving-Side

'''