Giter Club home page Giter Club logo

moe's Introduction

Moe

A monadic multimodal emotion recognition framework to detect and combine emotions in Haskell

Monadic detectors

We need a mechanism for passing a fixed input to several detectors, so we model a detector as a reader monad, that is, a computation which can read values from a shared environment.

data DetectorT r m a = DetectorT {
    runDetectorT :: r -> m a
}

The DetectorT type constructor is parameterised with three types: the type r of the input that the detector takes to analyze emotions (the shared environment, ie., the channels), an underlaying monad m for describing the effects of the detector, and the type a of the output produced by the detector. We are interested in collecting the results of external emotion recognition services, so the detectors will usually require the IO monad to sending HTTP requests.

Channels

We model a channel as a data type that acts as a container for the information to be analyzed by detectors. If we want to add new channels to our detectors, we just have to define a new data type.

newtype Face a = Face { getFaceChannel :: a }
newtype Voice a = Voice { getVoiceChannel :: a }

Emotions

In order to work with generic combinators that are useful for all detectors regardless of their API and the results they return, we define a common interface to extract information about emotions.

PAD

One of the approaches used to represent emotions is the dimensional approach. According to this representation form, an emotion can be expressed as a point in a space of several dimensions, usually a 3D one. In this space, each axis represents an aspect of an emotion. The PAD format is one of the most common dimensional approaches, in which an emotion can be described with three coordinates. The first coordinate (pleasure, also called valence) states how positive or negative an emotion is; the second coordinate states the level of arousal linked to that emotion; finally, the last coordinate states how dominant or passive the person expressing that emotion feels. Each coordinate is usually expressed with a number between -1 and 1.

data PAD = PAD {
    pleasure :: Double,
    arousal :: Double,
    dominance :: Double
} deriving (Eq, Show, Read)

Ekman

As a more comprehensive system to represent emotions, categorical models were proposed. According to this approach, an emotion is represented using a label from a closed set. In practice, these labels are usually paired with a number between 0 and 1 which expresses how confident the detector is about the presence of an emotion in piece of media. One of the most common set of labels used is the set of six basic emotions proposed by Ekman, according to whom there are six universally recognised emotions, which are happiness, sadness, disgust, anger, surprise and fear. Detectors usually include a seventh label, neutral, to indicate the absence of the other six emotions.

data Ekman = Ekman {
    anger :: Double,
    disgust :: Double,
    fear :: Double,
    joy :: Double,
    sadness :: Double,
    surprise :: Double,
    neutral :: Double
} deriving (Eq, Show, Read)

Emotion typeclass

We define the Emotion typeclass to generalize data types e that represent emotions, using three functions: toPAD, toEkman and linearMap. The first two functions convert a value of type e into a PAD or Ekman data, respectively. The linearMap function returns the linear mapping M = [anger, disgust, fear, joy, sadness, surprise, neutral] to be applied, if necessary.

Multimodal detectors

We define a right-associative infix operator type (:+:) similar to built-in tuples (,) in order to create inputs with more than one channel.

data (f :+: g) = f :+: g deriving (Show)
infixr 1 :+:

Also we define a (:<:) multi-parameter typeclass with an only method, channel, that allows us to inject a value from type (:+:) to another smaller type.

class a :<: b where
    channel :: b -> a

instance {-# OVERLAPPING #-} a :<: a where
    channel = id

instance {-# OVERLAPPING #-} a :<: (a :+: b) where
    channel (x :+: _) = channel x

instance {-# OVERLAPPING #-} (a :<: b) => a :<: (c :+: b) where
    channel (_ :+: y) = channel y

Instead of defining a detector on a specific channel (such as Face String), we will define a detector on a generic channel r that contains the channel.

faceapp :: Face String :<: r => (String, String) -> DetectorT r IO FaceAppData
faceapp (api_key, api_secret) = mkDetectorT (\input ->
    do r <- try $ post "https://api-us.faceplusplus.com/facepp/v3/detect" [
            "api_key" := pack api_key,
            "api_secret" := pack api_secret,
            "return_attributes" := pack "emotion",
            "image_url" := pack (getFaceChannel $ channel input)] :: IO (Either HttpException (Response ByteString))
       case r of
           Left ex   -> print ex >> empty
           Right val -> do guard $ val ^. responseStatus . statusCode == 200
                           case val ^? responseBody . key "faces" . nth 0 . key "attributes" . key "emotion" of
                               Just json -> return $ fromJust (decode (encode json))
                               Nothing   -> empty)

Examples

Functor

isHappy  :: (Functor f, Emotion e) => f e -> f Bool
isHappy = fmap ((> 0.5) . joy . toEkman)

ghci> runDetectorT (isHappy (faceapp ("****","****"))) (Face "path/to/img")
True

Applicative

ghci> let d1 = faceapp ("****","****")
ghci> let d2 = voiceapp ("****","****") (0,5)
ghci> let d3 = (&&) <$> isHappy d1 <*> isHappy d2
ghci> runDetectorT d3 (Face "..." :+: Voice "...")
True

Monad

ghci> runDetectorT (filterM (isHappy . voiceapp "****") [(0,5), (5,10), (10,15)]) (Voice "happy.mp4")
[(0,5), (10,15)]

Monad transformer

iod :: (Face String :<: r, Voice String :<: r) => DetectorT r IO PAD
iod = do api_key <- lift getLine
         api_secret <- lift getLine
         face <- faceapp (api_key,api_secret)
         api_key' <- lift getLine
         voice <- voiceapp api_key'
         let mean = (toPAD face + toPAD voice) / 2
         return mean
dw :: (Face String :<: r, Voice String :<: r) => WriterT [String] (DetectorT r IO) PAD
dw = do face <- lift $ faceapp ("****","****")
        tell ["face: " ++ show face]
        voice <- lift $ voiceapp "****"
        tell ["voice: " ++ show voice]
        let mean = (toPAD face + toPAD voice) / 2
        tell ["mean: " ++ show mean]
        return mean

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.