Giter Club home page Giter Club logo

blog's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

blog's Issues

Vue全家桶实现原理简要梳理

网上Vue源码解读的文章有很多,但涉及到Vuex、vue-router的就比较少了。本文主要对描述了Vue的整体代码结构,用两个例子分别描述Vue实例化及ssr的流程;接着阐述Vue插件的注册方法,Vuex、vue-router大致实现原理。

Vue

如何用例子走一遍Vue代码

1.clone vue代码到本地
2.在example目录中写上任意你想要测试的例子
3.npm run dev && node e2e/test/runner.js

目录结构

  • root/
    • compiler/--------------------解析template,生成render函数和ast
      • parser/-------------------正则遍历template里的字符串,通过栈记录元素关系,生成ast
      • codegen/-----------------根据ast生成render函数
      • directives/---------------解析ast中的v-bind,v-model指令,生成对应render函数
      • index.js
    • core/---------------------Vue实例相关,vue源码核心
      • components/------------通用组件,keep-alive
      • gloabal-api/----------注册Vue构造函数上的静态方法,比如Vue.install、Vue.set...
      • instance/-------------注册vue.prototype,以及构造函数
      • observer/-------------数据双向绑定相关,主要由watcher、observer、dep组成
      • util/-------------工具
      • vdom/-------------vnode相关,包含createVnode,patchNode等
      • index.js
    • platforms------------core基础上扩展
      • web-------------将core中的代码包装成web平台所需的方法,比如Vue.prototype.$mount实际包装了core中的$mount
      • weex
    • server-----------ssr相关,执行Vue代码,生成Vue实例;输出流或字符串,传递给renderNode,renderNode通过Vnode生成各种HTML标签
    • shared------------上述公共的工具
      • util.js

vue构造函数

我们使用vue时都会先实例化一个Vue对象,先从Vue的构造函数说起。构造函数及原型相关代码绝大部分都在core/instance下。

至于怎么找到Vue构造函数的位置,运用从后向前的方法,从package.json一点点往会看就好了。

首先看core/instance/index.js文件,该文件主要定义了Vue的构造函数,并且初始化Vue.prototype中的一些方法。

import { initMixin } from './init'
import { stateMixin } from './state'
import { renderMixin } from './render'
import { eventsMixin } from './events'
import { lifecycleMixin } from './lifecycle'
import { warn } from '../util/index'

function Vue (options) {
  if (process.env.NODE_ENV !== 'production' &&
    !(this instanceof Vue)) {
    warn('Vue is a constructor and should be called with the `new` keyword')
  }
  /*初始化*/
  this._init(options)
}

initMixin(Vue)
stateMixin(Vue)
eventsMixin(Vue)
lifecycleMixin(Vue)
renderMixin(Vue)

export default Vue

initMixin就做了一件事情,在Vue的原型上增加_init方法,构造Vue实例的时候会调用这个_init方法来初始化Vue实例,下面常用使用原理中会详细说一下这一块。

stateMixin中主要声明了Vue.prototype.$dataVue.prototype.$propsVue.prototype.$setVue.prototype.$watch

export function stateMixin (Vue: Class<Component>) {
  // flow somehow has problems with directly declared definition object
  // when using Object.defineProperty, so we have to procedurally build up
  // the object here.
  const dataDef = {}
  dataDef.get = function () { return this._data }
  const propsDef = {}
  propsDef.get = function () { return this._props }
  ......
  Object.defineProperty(Vue.prototype, '$data', dataDef)
  Object.defineProperty(Vue.prototype, '$props', propsDef)

  Vue.prototype.$set = set
  Vue.prototype.$delete = del
  
  // 数据绑定相关后面会详细解读
  Vue.prototype.$watch = function (
    expOrFn: string | Function,
    cb: Function,
    options?: Object
  ): Function {
    ......
  }

eventsMixin主要定义了Vue.prototype.$on/$off/$once,原理就是利用观察者模型,为每一个event维护一个观察队列,存放在Vue._events中。

lifecycleMixin中定义了我们Vue中经常用到的Vue.prototype._update方法,每当我们定义的组件data发生变化或其他原因需要重新渲染时,Vue会调用该方法,对Vnode做diff和patch操作。

export function lifecycleMixin (Vue: Class<Component>) {
  /*更新节点*/
  Vue.prototype._update = function (vnode: VNode, hydrating?: boolean) {
    const vm: Component = this
    /*如果已经该组件已经挂载过了则代表进入这个步骤是个更新的过程,触发beforeUpdate钩子*/
    if (vm._isMounted) {
      callHook(vm, 'beforeUpdate')
    }
    const prevEl = vm.$el
    const prevVnode = vm._vnode
    const prevActiveInstance = activeInstance
    activeInstance = vm
    vm._vnode = vnode
    // Vue.prototype.__patch__ is injected in entry points
    // based on the rendering backend used.
    /*基于后端渲染Vue.prototype.__patch__被用来作为一个入口*/
    if (!prevVnode) {
      // initial render
      vm.$el = vm.__patch__(
        vm.$el, vnode, hydrating, false /* removeOnly */,
        vm.$options._parentElm,
        vm.$options._refElm
      )
    } else {
      // updates
      vm.$el = vm.__patch__(prevVnode, vnode)
    }
    activeInstance = prevActiveInstance
    // update __vue__ reference
    /*更新新的实例对象的__vue__*/
    if (prevEl) {
      prevEl.__vue__ = null
    }
    if (vm.$el) {
      vm.$el.__vue__ = vm
    }
    // if parent is an HOC, update its $el as well
    if (vm.$vnode && vm.$parent && vm.$vnode === vm.$parent._vnode) {
      vm.$parent.$el = vm.$el
    }
    // updated hook is called by the scheduler to ensure that children are
    // updated in a parent's updated hook.
  }

  Vue.prototype.$forceUpdate = function () {
    const vm: Component = this
    if (vm._watcher) {
      vm._watcher.update()
    }
  }

  Vue.prototype.$destroy = function () {
    const vm: Component = this
    if (vm._isBeingDestroyed) {
      return
    }
    /* 调用beforeDestroy钩子 */
    callHook(vm, 'beforeDestroy')
    /* 标志位 */
    vm._isBeingDestroyed = true
    // remove self from parent
    const parent = vm.$parent
    if (parent && !parent._isBeingDestroyed && !vm.$options.abstract) {
      remove(parent.$children, vm)
    }
    // teardown watchers
    /* 该组件下的所有Watcher从其所在的Dep中释放 */
    if (vm._watcher) {
      vm._watcher.teardown()
    }
    let i = vm._watchers.length
    while (i--) {
      vm._watchers[i].teardown()
    }
    // remove reference from data ob
    // frozen object may not have observer.
    if (vm._data.__ob__) {
      vm._data.__ob__.vmCount--
    }
    // call the last hook...
    vm._isDestroyed = true
    // invoke destroy hooks on current rendered tree
    vm.__patch__(vm._vnode, null)
    // fire destroyed hook
    /* 调用destroyed钩子 */
    callHook(vm, 'destroyed')
    // turn off all instance listeners.
    /* 移除所有事件监听 */
    vm.$off()
    // remove __vue__ reference
    if (vm.$el) {
      vm.$el.__vue__ = null
    }
    // remove reference to DOM nodes (prevents leak)
    vm.$options._parentElm = vm.$options._refElm = null
  }
}

renderMixin中定义了Vue.prototype._render等方法,_render()调用实例化时传入的render方法,生成VNode。经常与Vue.prototype.update一起使用。

// 组件更新时调用
vm._update(vm._render(), hydrating);

常见使用原理解读

创建实例

// boot up the demo
var demo = new Vue({
  el: '#demo',
  data: {
    treeData: data,
    a: 1
  },
  computed: {
    hello() {
      return this.treeData;
    }
  },
  render(createElement) {
    // @returns {VNode}
    return createElement(
      // {String | Object | Function}
      // 一个 HTML 标签字符串,组件选项对象,或者一个返回值类型为 String/Object 的函数,必要参数
      'div',
      // {Object}
      // 一个包含模板相关属性的数据对象
      // 这样,您可以在 template 中使用这些属性。可选参数。
      {
        // (详情见下一节)
      },
      // {String | Array}
      // 子节点 (VNodes),由 `createElement()` 构建而成,
      // 或使用字符串来生成“文本节点”。可选参数。
      [
        // createElement(Profile3),
        '先写一些文字',
        createElement('h1', '一则头条'),
        // createElement(Profile),
        // createElement(Profile4)
      ]
    )
  }
})

new Vue()其实就是调用构造函数中的this._init()this._init()就是调用上述instance/init中声明的Vue.prototype._init

export function initMixin (Vue: Class<Component>) {
  Vue.prototype._init = function (options?: Object) {
    const vm: Component = this
    ......
    
    // expose real self
    vm._self = vm
    /*初始化生命周期*/
    initLifecycle(vm)
    /*初始化事件*/
    initEvents(vm)
    /*初始化render*/
    initRender(vm)
    /*调用beforeCreate钩子函数并且触发beforeCreate钩子事件*/
    callHook(vm, 'beforeCreate')
    initInjections(vm) // resolve injections before data/props
    /*初始化props、methods、data、computed与watch*/
    initState(vm)
    initProvide(vm) // resolve provide after data/props
    /*调用created钩子函数并且触发created钩子事件*/
    callHook(vm, 'created')

    /* istanbul ignore if */
    if (process.env.NODE_ENV !== 'production' && config.performance && mark) {
      /*格式化组件名*/
      vm._name = formatComponentName(vm, false)
      mark(endTag)
      measure(`${vm._name} init`, startTag, endTag)
    }

    if (vm.$options.el) {
      /*挂载组件*/
      vm.$mount(vm.$options.el)
    }
  }
}

initLifecycle,主要把自己push到parent.$children中

/*初始化生命周期*/
export function initLifecycle (vm: Component) {
  const options = vm.$options

  // locate first non-abstract parent
  /* 将vm对象存储到parent组件中(保证parent组件是非抽象组件,比如keep-alive) */
  let parent = options.parent
  if (parent && !options.abstract) {
    while (parent.$options.abstract && parent.$parent) {
      parent = parent.$parent
    }
    parent.$children.push(vm)
  }

  ......
}

initEvents,主要初始化了vm._events存放事件。$on()方法就是将事件监听存放在这里。

/*初始化事件*/
export function initEvents (vm: Component) {
  /*在vm上创建一个_events对象,用来存放事件。*/
  vm._events = Object.create(null)
	......
}

initRender,定义了vm.$createElement方法,我们调用render()方法时,传入参数就是vm.$createElement

/*初始化render*/
export function initRender (vm: Component) {
 ......
  /*将createElement函数绑定到该实例上,该vm存在闭包中,不可修改,vm实例则固定。这样我们就可以得到正确的上下文渲染*/
  vm._c = (a, b, c, d) => createElement(vm, a, b, c, d, false)
  // normalization is always applied for the public version, used in
  // user-written render functions.
  /*常规方法呗用于公共版本,被用来作为用户界面的渲染方法*/
  vm.$createElement = (a, b, c, d) => createElement(vm, a, b, c, d, true)
}

initState,这里主要initProps,initComputed,initData。我们首先介绍initData,并借initData来解读一下Vue的数据响应系统

initData调用observer/index.js中的observe方法,生成observer对象,observer遍历data中的数据,把每一项数据都变成响应式的。

initData,主要就看最后一行调用observe()

/*initData*/
function initData (vm: Component) {
  /*得到data数据*/
  let data = vm.$options.data
  data = vm._data = typeof data === 'function'
    ? getData(data, vm)
    : data || {}

  /*对对象类型进行严格检查,只有当对象是纯javascript对象的时候返回true*/
  if (!isPlainObject(data)) {
    data = {}
    process.env.NODE_ENV !== 'production' && warn(
      'data functions should return an object:\n' +
      'https://vuejs.org/v2/guide/components.html#data-Must-Be-a-Function',
      vm
    )
  }
  // proxy data on instance
  /*遍历data对象*/
  const keys = Object.keys(data)
  const props = vm.$options.props
  let i = keys.length

  //遍历data中的数据
  while (i--) {

    /*保证data中的key不与props中的key重复,props优先,如果有冲突会产生warning*/
    if (props && hasOwn(props, keys[i])) {
      process.env.NODE_ENV !== 'production' && warn(
        `The data property "${keys[i]}" is already declared as a prop. ` +
        `Use prop default value instead.`,
        vm
      )
    } else if (!isReserved(keys[i])) {
      /*判断是否是保留字段*/

      /*这里是我们前面讲过的代理,将data上面的属性代理到了vm实例上*/
      proxy(vm, `_data`, keys[i])
    }
  }
  // observe data
  /*从这里开始我们要observe了,开始对数据进行绑定,下面会进行递归observe进行对深层对象的绑定。*/
  observe(data, true /* asRootData */)
}

observe中new Observer(), new Observer()会将data中的所有数据调用defineReactive变成响应式。主要原理就是利用Object.defineProperty,get()时增加依赖,也就是观察者,set时通知观察者。

/*为对象defineProperty上在变化时通知的属性*/
export function defineReactive (
  obj: Object,
  key: string,
  val: any,
  customSetter?: Function
) {
  /*在闭包中定义一个dep对象*/
  const dep = new Dep()

  const property = Object.getOwnPropertyDescriptor(obj, key)
  if (property && property.configurable === false) {
    return
  }

  /*如果之前该对象已经预设了getter以及setter函数则将其取出来,新定义的getter/setter中会将其执行,保证不会覆盖之前已经定义的getter/setter。*/
  // cater for pre-defined getter/setters
  const getter = property && property.get
  const setter = property && property.set

  /*对象的子对象递归进行observe并返回子节点的Observer对象*/
  let childOb = observe(val)
  Object.defineProperty(obj, key, {
    enumerable: true,
    configurable: true,
    get: function reactiveGetter () {
      /*如果原本对象拥有getter方法则执行*/
      const value = getter ? getter.call(obj) : val
      if (Dep.target) {
        /*进行依赖收集*/
        dep.depend()
        if (childOb) {
          /*子对象进行依赖收集,其实就是将同一个watcher观察者实例放进了两个depend中,一个是正在本身闭包中的depend,另一个是子元素的depend*/
          childOb.dep.depend()
        }
        if (Array.isArray(value)) {
          /*是数组则需要对每一个成员都进行依赖收集,如果数组的成员还是数组,则递归。*/
          dependArray(value)
        }
      }
      return value
    },
    set: function reactiveSetter (newVal) {
      /*通过getter方法获取当前值,与新值进行比较,一致则不需要执行下面的操作*/
      const value = getter ? getter.call(obj) : val
      /* eslint-disable no-self-compare */
      if (newVal === value || (newVal !== newVal && value !== value)) {
        return
      }
      /* eslint-enable no-self-compare */
      if (process.env.NODE_ENV !== 'production' && customSetter) {
        customSetter()
      }
      if (setter) {
        /*如果原本对象拥有setter方法则执行setter*/
        setter.call(obj, newVal)
      } else {
        val = newVal
      }
      /*新的值需要重新进行observe,保证数据响应式*/
      childOb = observe(newVal)
      /*dep对象通知所有的观察者*/
      dep.notify()
    }
  })
}

我们在声明组件时,经常使用watch,实际调用了new watcher(a, callback),watcher相当于一个观察者。我们来看watcher里的代码,其实也不难理解,watcher就是一个订阅者。关键在于watcher如何与observer联系在一起,observer中的数据set()时,如何找到对应的watcher呢?dep出现了!注意下面的get()中的pushTarget(),该方法就是将自己放到dep模块中的全局变量上,然后调用this.getter.call(vm, vm),也就是调用了obsever的get(),get()中取得dep中的全局变量,加到了自身的dep中,当set时,会遍历执行dep中存放所有watcher的run()方法,执行callback。

watcher和dep代码如下。

 /*
    一个解析表达式,进行依赖收集的观察者,同时在表达式数据变更时触发回调函数。它被用于$watch api以及指令
 */
export default class Watcher {

  constructor (
    vm: Component,
    expOrFn: string | Function,
    cb: Function,
    options?: Object
  ) {
    this.vm = vm
    /*_watchers存放订阅者实例*/
    vm._watchers.push(this)
    ......
    this.value = this.lazy
      ? undefined
      : this.get()
  }

  /**
   * Evaluate the getter, and re-collect dependencies.
   */
   /*获得getter的值并且重新进行依赖收集*/
  get () {
  	......
  	
  	/*将自身watcher观察者实例设置给Dep.target,用以依赖收集。*/
    pushTarget(this)
    
    ......
    
    value = this.getter.call(vm, vm)
    
    ......
    if (this.deep) {
      /*递归每一个对象或者数组,触发它们的getter,使得对象或数组的每一个成员都被依赖收集,形成一个“深(deep)”依赖关系*/
      traverse(value)
    }

    /*将观察者实例从target栈中取出并设置给Dep.target*/
    popTarget()
    this.cleanupDeps()
    return value
  }

  
   /*
      调度者工作接口,将被调度者回调。
    */
  run () {
  	......
    this.cb.call(this.vm, value, oldValue)
    ......
  }

  /**
   * Evaluate the value of the watcher.
   * This only gets called for lazy watchers.
   */
   /*获取观察者的值*/
  evaluate () {
    this.value = this.get()
    this.dirty = false
  }
export default class Dep {
  constructor () {
    this.id = uid++
    this.subs = []
  }

  /*添加一个观察者对象*/
  addSub (sub: Watcher) {
    this.subs.push(sub)
  }

  /*移除一个观察者对象*/
  removeSub (sub: Watcher) {
    remove(this.subs, sub)
  }

  /*依赖收集,当存在Dep.target的时候添加观察者对象*/
  depend () {
    if (Dep.target) {
      Dep.target.addDep(this)
    }
  }

  /*通知所有订阅者*/
  notify () {
    // stabilize the subscriber list first
    const subs = this.subs.slice()
    for (let i = 0, l = subs.length; i < l; i++) {
      subs[i].update()
    }
  }
}

/*依赖收集完需要将Dep.target设为null,防止后面重复添加依赖。*/
Dep.target = null
const targetStack = []

/*将watcher观察者实例设置给Dep.target,用以依赖收集。同时将该实例存入target栈中*/
export function pushTarget (_target: Watcher) {
  if (Dep.target) targetStack.push(Dep.target)
  Dep.target = _target
}

/*将观察者实例从target栈中取出并设置给Dep.target*/
export function popTarget () {
  Dep.target = targetStack.pop()
}

总结一下initData及Vue的响应式数据。

GitHub

1.observe(data) => defineReactive
2.watch(a, callback) => new Watcher() => pushTarget(this) => getter.call()
3.getter()中执行dep.depend,收集pushTarget的watcher
4.当执行a=3时,遍历dep中存储的所有watcher,执行其监听函数。

我们接下来看initComputed,其实就是new了一个watcher,然后执行computed函数时会调用其中所有依赖数据的getter,从而将该watcher加入到其依赖数据的dep中。

/*初始化computed*/
function initComputed (vm: Component, computed: Object) {
  const watchers = vm._computedWatchers = Object.create(null)

  for (const key in computed) {
    const userDef = computed[key]
    /*
      计算属性可能是一个function,也有可能设置了get以及set的对象。
      可以参考 https://cn.vuejs.org/v2/guide/computed.html#计算-setter
    */
    let getter = typeof userDef === 'function' ? userDef : userDef.get
    if (process.env.NODE_ENV !== 'production') {
      /*getter不存在的时候抛出warning并且给getter赋空函数*/
      if (getter === undefined) {
        warn(
          `No getter function has been defined for computed property "${key}".`,
          vm
        )
        getter = noop
      }
    }
    // create internal watcher for the computed property.
    /*
      为计算属性创建一个内部的监视器Watcher,保存在vm实例的_computedWatchers中
      这里的computedWatcherOptions参数传递了一个lazy为true,会使得watch实例的dirty为true
    */
    watchers[key] = new Watcher(vm, getter, noop, computedWatcherOptions)

    // component-defined computed properties are already defined on the
    // component prototype. We only need to define computed properties defined
    // at instantiation here.
    /*组件正在定义的计算属性已经定义在现有组件的原型上则不会进行重复定义*/
    if (!(key in vm)) {
      /*定义计算属性*/
      defineComputed(vm, key, userDef)
    } else if (process.env.NODE_ENV !== 'production') {
      /*如果计算属性与已定义的data或者props中的名称冲突则发出warning*/
      if (key in vm.$data) {
        warn(`The computed property "${key}" is already defined in data.`, vm)
      } else if (vm.$options.props && key in vm.$options.props) {
        warn(`The computed property "${key}" is already defined as a prop.`, vm)
      }
    }
  }
}

声明组件

var Profile3 = Vue.component({
  template: `<div id="demo">
  <button v-on:click="show = !show">
    Toggle
  </button>
  <transition name="fade">
    <p v-if="show">hello</p>
  </transition>
</div>`,
  data: function () {
    return {
      firstName: '',
      lastName: 'White',
      alias: 'Heisenberg',
      show: true      
    }
  }
})

这里主要说一下Vue.component方法与Vue.extend方法。

Vue.extend(core/global-api/extend.js), 其实就是寄生组合继承了Vue。

	/*
   使用基础 Vue 构造器,创建一个“子类”。
   其实就是扩展了基础构造器,形成了一个可复用的有指定选项功能的子构造器。
   参数是一个包含组件option的对象。  https://cn.vuejs.org/v2/api/#Vue-extend-options
   */
  Vue.extend = function (extendOptions: Object): Function {
    ......

    /*
      Sub构造函数其实就一个_init方法,这跟Vue的构造方法是一致的,在_init中处理各种数据初始化、生命周期等。
      因为Sub作为一个Vue的扩展构造器,所以基础的功能还是需要保持一致,跟Vue构造器一样在构造函数中初始化_init。
    */
    const Sub = function VueComponent (options) {
      this._init(options)
    }
    /*继承父类*/
    Sub.prototype = Object.create(Super.prototype)
    /*构造函数*/
    Sub.prototype.constructor = Sub
    /*创建一个新的cid*/
    Sub.cid = cid++
    /*将父组件的option与子组件的合并到一起(Vue有一个cid为0的基类,即Vue本身,会将一些默认初始化的option何入)*/
    Sub.options = mergeOptions(
      Super.options,
      extendOptions
    )
    /*利用super标记父类*/
    Sub['super'] = Super
    
    ......
    return Sub
  }
}

Vue.componentVue.extend类似,core/global-api/assets.js

if (type === 'component' && isPlainObject(definition)) {
  definition.name = definition.name || id
  definition = this.options._base.extend(definition)// vue.extend
}

组件挂载

Vue.prototype._init中最后调用了vm.$mount(vm.$options.el)

流程如下:
GitHub

首先什么事vdom,其实很简单,就是一个组件就是一个vdom对象,维护在Vue中,方便取新老vnode去diff,然后针对性的去渲染。

Vnode:

export default class VNode {

  constructor (
    tag?: string,
    data?: VNodeData,
    children?: ?Array<VNode>,
    text?: string,
    elm?: Node,
    context?: Component,
    componentOptions?: VNodeComponentOptions
  ) {
    /*当前节点的标签名*/
    this.tag = tag
    /*当前节点对应的对象,包含了具体的一些数据信息,是一个VNodeData类型,可以参考VNodeData类型中的数据信息*/
    this.data = data
    /*当前节点的子节点,是一个数组*/
    this.children = children
    /*当前节点的文本*/
    this.text = text
    /*当前虚拟节点对应的真实dom节点*/
    this.elm = elm
    /*当前节点的名字空间*/
    this.ns = undefined
    /*当前节点的编译作用域*/
    this.context = context
    /*函数化组件作用域*/
    this.functionalContext = undefined
    /*节点的key属性,被当作节点的标志,用以优化*/
    this.key = data && data.key
    /*组件的option选项*/
    this.componentOptions = componentOptions
    /*当前节点对应的组件的实例*/
    this.componentInstance = undefined
    /*当前节点的父节点*/
    this.parent = undefined
    /*简而言之就是是否为原生HTML或只是普通文本,innerHTML的时候为true,textContent的时候为false*/
    this.raw = false
    /*是否为静态节点*/
    this.isStatic = false
    /*是否作为跟节点插入*/
    this.isRootInsert = true
    /*是否为注释节点*/
    this.isComment = false
    /*是否为克隆节点*/
    this.isCloned = false
    /*是否有v-once指令*/
    this.isOnce = false
  }

  // DEPRECATED: alias for componentInstance for backwards compat.
  /* istanbul ignore next */
  get child (): Component | void {
    return this.componentInstance
  }
}

渲染(patch)主要逻辑大致如下

GitHub

patch

  function patch (oldVnode, vnode, hydrating, removeOnly, parentElm, refElm) {
    /*vnode不存在则直接调用销毁钩子*/
    if (isUndef(vnode)) {
      if (isDef(oldVnode)) invokeDestroyHook(oldVnode)
      return
    }

    let isInitialPatch = false
    const insertedVnodeQueue = []

    if (isUndef(oldVnode)) {
      // empty mount (likely as component), create new root element
      /*oldVnode未定义的时候,其实也就是root节点,创建一个新的节点*/
      isInitialPatch = true
      createElm(vnode, insertedVnodeQueue, parentElm, refElm)
    } else {
      /*标记旧的VNode是否有nodeType*/
      const isRealElement = isDef(oldVnode.nodeType)
      if (!isRealElement && sameVnode(oldVnode, vnode)) {
        // patch existing root node
        /*是同一个节点的时候直接修改现有的节点*/
        patchVnode(oldVnode, vnode, insertedVnodeQueue, removeOnly)
      } else {
        createElm(
          vnode,
          insertedVnodeQueue,
          // extremely rare edge case: do not insert if old element is in a
          // leaving transition. Only happens when combining transition +
          // keep-alive + HOCs. (#4590)
          oldElm._leaveCb ? null : parentElm,
          nodeOps.nextSibling(oldElm)
        )
    }

    /*调用insert钩子*/
    invokeInsertHook(vnode, insertedVnodeQueue, isInitialPatch)
    return vnode.elm
  }

patchNode:

/*如果这个VNode节点没有text文本时*/
    if (isUndef(vnode.text)) {
      if (isDef(oldCh) && isDef(ch)) {
        /*新老节点均有children子节点,则对子节点进行diff操作,调用updateChildren*/
        if (oldCh !== ch) updateChildren(elm, oldCh, ch, insertedVnodeQueue, removeOnly)
      } else if (isDef(ch)) {
        /*如果老节点没有子节点而新节点存在子节点,先清空elm的文本内容,然后为当前节点加入子节点*/
        if (isDef(oldVnode.text)) nodeOps.setTextContent(elm, '')
        addVnodes(elm, null, ch, 0, ch.length - 1, insertedVnodeQueue)
      } else if (isDef(oldCh)) {
        /*当新节点没有子节点而老节点有子节点的时候,则移除所有ele的子节点*/
        removeVnodes(elm, oldCh, 0, oldCh.length - 1)
      } else if (isDef(oldVnode.text)) {
        /*当新老节点都无子节点的时候,只是文本的替换,因为这个逻辑中新节点text不存在,所以直接去除ele的文本*/
        nodeOps.setTextContent(elm, '')
      }
    } else if (oldVnode.text !== vnode.text) {
      /*当新老节点text不一样时,直接替换这段文本*/
      nodeOps.setTextContent(elm, vnode.text)
    }

如果两个节点都有children, updateChildren(elm, oldCh, ch, insertedVnodeQueue, removeOnly)

function updateChildren (parentElm, oldCh, newCh, insertedVnodeQueue, removeOnly) {
    let oldStartIdx = 0
    let newStartIdx = 0
    let oldEndIdx = oldCh.length - 1
    let oldStartVnode = oldCh[0]
    let oldEndVnode = oldCh[oldEndIdx]
    let newEndIdx = newCh.length - 1
    let newStartVnode = newCh[0]
    let newEndVnode = newCh[newEndIdx]
    let oldKeyToIdx, idxInOld, elmToMove, refElm

    // removeOnly is a special flag used only by <transition-group>
    // to ensure removed elements stay in correct relative positions
    // during leaving transitions
    const canMove = !removeOnly

    while (oldStartIdx <= oldEndIdx && newStartIdx <= newEndIdx) {
      if (isUndef(oldStartVnode)) {
        oldStartVnode = oldCh[++oldStartIdx] // Vnode has been moved left
      } else if (isUndef(oldEndVnode)) {
        oldEndVnode = oldCh[--oldEndIdx]
      } else if (sameVnode(oldStartVnode, newStartVnode)) {
        /*前四种情况其实是指定key的时候,判定为同一个VNode,则直接patchVnode即可,分别比较oldCh以及newCh的两头节点2*2=4种情况*/
        patchVnode(oldStartVnode, newStartVnode, insertedVnodeQueue)
        oldStartVnode = oldCh[++oldStartIdx]
        newStartVnode = newCh[++newStartIdx]
      } else if (sameVnode(oldEndVnode, newEndVnode)) {
        patchVnode(oldEndVnode, newEndVnode, insertedVnodeQueue)
        oldEndVnode = oldCh[--oldEndIdx]
        newEndVnode = newCh[--newEndIdx]
      } else if (sameVnode(oldStartVnode, newEndVnode)) { // Vnode moved right
        patchVnode(oldStartVnode, newEndVnode, insertedVnodeQueue)
        canMove && nodeOps.insertBefore(parentElm, oldStartVnode.elm, nodeOps.nextSibling(oldEndVnode.elm))
        oldStartVnode = oldCh[++oldStartIdx]
        newEndVnode = newCh[--newEndIdx]
      } else if (sameVnode(oldEndVnode, newStartVnode)) { // Vnode moved left
        patchVnode(oldEndVnode, newStartVnode, insertedVnodeQueue)
        canMove && nodeOps.insertBefore(parentElm, oldEndVnode.elm, oldStartVnode.elm)
        oldEndVnode = oldCh[--oldEndIdx]
        newStartVnode = newCh[++newStartIdx]
      } else {
        /*
          生成一个key与旧VNode的key对应的哈希表(只有第一次进来undefined的时候会生成,也为后面检测重复的key值做铺垫)
          比如childre是这样的 [{xx: xx, key: 'key0'}, {xx: xx, key: 'key1'}, {xx: xx, key: 'key2'}]  beginIdx = 0   endIdx = 2  
          结果生成{key0: 0, key1: 1, key2: 2}
        */
        if (isUndef(oldKeyToIdx)) oldKeyToIdx = createKeyToOldIdx(oldCh, oldStartIdx, oldEndIdx)
        /*如果newStartVnode新的VNode节点存在key并且这个key在oldVnode中能找到则返回这个节点的idxInOld(即第几个节点,下标)*/
        idxInOld = isDef(newStartVnode.key) ? oldKeyToIdx[newStartVnode.key] : null
        if (isUndef(idxInOld)) { // New element
          /*newStartVnode没有key或者是该key没有在老节点中找到则创建一个新的节点*/
          createElm(newStartVnode, insertedVnodeQueue, parentElm, oldStartVnode.elm)
          newStartVnode = newCh[++newStartIdx]
        } else {
          /*获取同key的老节点*/
          elmToMove = oldCh[idxInOld]
          /* istanbul ignore if */
          if (process.env.NODE_ENV !== 'production' && !elmToMove) {
            /*如果elmToMove不存在说明之前已经有新节点放入过这个key的Dom中,提示可能存在重复的key,确保v-for的时候item有唯一的key值*/
            warn(
              'It seems there are duplicate keys that is causing an update error. ' +
              'Make sure each v-for item has a unique key.'
            )
          }
          if (sameVnode(elmToMove, newStartVnode)) {
            /*如果新VNode与得到的有相同key的节点是同一个VNode则进行patchVnode*/
            patchVnode(elmToMove, newStartVnode, insertedVnodeQueue)
            /*因为已经patchVnode进去了,所以将这个老节点赋值undefined,之后如果还有新节点与该节点key相同可以检测出来提示已有重复的key*/
            oldCh[idxInOld] = undefined
            /*当有标识位canMove实可以直接插入oldStartVnode对应的真实Dom节点前面*/
            canMove && nodeOps.insertBefore(parentElm, newStartVnode.elm, oldStartVnode.elm)
            newStartVnode = newCh[++newStartIdx]
          } else {
            // same key but different element. treat as new element
            /*当新的VNode与找到的同样key的VNode不是sameVNode的时候(比如说tag不一样或者是有不一样type的input标签),创建一个新的节点*/
            createElm(newStartVnode, insertedVnodeQueue, parentElm, oldStartVnode.elm)
            newStartVnode = newCh[++newStartIdx]
          }
        }
      }
    }
    if (oldStartIdx > oldEndIdx) {
      /*全部比较完成以后,发现oldStartIdx > oldEndIdx的话,说明老节点已经遍历完了,新节点比老节点多,所以这时候多出来的新节点需要一个一个创建出来加入到真实Dom中*/
      refElm = isUndef(newCh[newEndIdx + 1]) ? null : newCh[newEndIdx + 1].elm
      addVnodes(parentElm, refElm, newCh, newStartIdx, newEndIdx, insertedVnodeQueue)
    } else if (newStartIdx > newEndIdx) {
      /*如果全部比较完成以后发现newStartIdx > newEndIdx,则说明新节点已经遍历完了,老节点多余新节点,这个时候需要将多余的老节点从真实Dom中移除*/
      removeVnodes(parentElm, oldCh, oldStartIdx, oldEndIdx)
    }
  }

这块网上有很多讲解,篇幅已经够长了,就不在这啰嗦了。推荐Vue 2.0 的 virtual-dom 实现简析

ssr

const clientBundleFileUrl = '/bundle.client.js';
const clientBundleFilePath = path.join(__dirname, '../dist/bundle.client.js');

// Server-Side Bundle File
const serverBundleFilePath = path.join(__dirname, '../dist/bundle.server.js')


// Server-Side Rendering
app.get('/', function (req, res) {
  // const vm = new App({ url: req.url })
  const serverBundleFileCode = fs.readFileSync(serverBundleFilePath, 'utf8');
  const bundleRenderer = vueServerRenderer.createBundleRenderer(serverBundleFileCode);
  
  // Client-Side Bundle File
  

  const stream = bundleRenderer.renderToStream()

  res.write(`<!DOCTYPE html><html><head><title>...</title></head><body>`)

  stream.on('data', chunk => {
    console.log(chunk.toString())
    res.write(chunk)
  })

  stream.on('end', () => {
    res.end('</body></html>')
  })
});

流程如下

GitHub

const renderer = createRenderer(rendererOptions);返回的renderer主要是使用renderNode()方法,根据Vnode各种拼html。

if (isDef(node.tag)) {
    renderElement(node, isRoot, context)

const run = createBundleRunner(entry, files, basedir, runInNewContext)返回一个Promise,在Promise中执行打包后的代码,resolve(app)=>返回实例。

return (userContext = {}) => new Promise(resolve => {
  userContext._registeredComponents = new Set()
  const res = evaluate(entry, createContext(userContext))
  resolve(typeof res === 'function' ? res(userContext) : res)
})

那么我们打包后的ssr代码,如何执行的呢?在createBundleRunner有这样一段代码。其中NativeModule.wrap()方法也是node中包裹模块时使用的方法。

const code = files[filename]
const wrapper = NativeModule.wrap(code)
const script = new vm.Script(wrapper, {
  filename,
  displayErrors: true
})

下面我们来看renderToStream,其实是调用了renderer.renderToStream,下面我们来看renderer.renderToStream

renderToStream (
  component: Component,
  context?: Object
): stream$Readable {
  if (context) {
    templateRenderer.bindRenderFns(context)
  }
  const renderStream = new RenderStream((write, done) => {
    render(component, write, context, done)
  })
  if (!template) {
    return renderStream
  } else {
    const templateStream = templateRenderer.createStream(context)
    renderStream.on('error', err => {
      templateStream.emit('error', err)
    })
    renderStream.pipe(templateStream)
    return templateStream
  }
}

其中new RenderStream(),RenderStream代码如下

export default class RenderStream extends stream.Readable {

  constructor (render: Function) {
    super()
    this.buffer = ''
    this.render = render
    this.expectedSize = 0

    this.write = createWriteFunction((text, next) => {
      const n = this.expectedSize
      this.buffer += text
      if (this.buffer.length >= n) {
        this.next = next
        this.pushBySize(n)
        return true // we will decide when to call next
      }
      return false
    }, err => {
      this.emit('error', err)
    })

    this.end = () => {
      // the rendering is finished; we should push out the last of the buffer.
      this.done = true
      this.push(this.buffer)
    }
  }

  pushBySize (n: number) {
    const bufferToPush = this.buffer.substring(0, n)
    this.buffer = this.buffer.substring(n)
    this.push(bufferToPush)
  }

  tryRender () {
    try {
      this.render(this.write, this.end)
    } catch (e) {
      this.emit('error', e)
    }
  }

  tryNext () {
    try {
      this.next()
    } catch (e) {
      this.emit('error', e)
    }
  }

  _read (n: number) {
    this.expectedSize = n
    // it's possible that the last chunk added bumped the buffer up to > 2 * n,
    // which means we will need to go through multiple read calls to drain it
    // down to < n.
    if (isTrue(this.done)) {
      this.push(null)
      return
    }
    if (this.buffer.length >= n) {
      this.pushBySize(n)
      return
    }
    if (isUndef(this.next)) {
      // start the rendering chain.
      this.tryRender()
    } else {
      // continue with the rendering.
      this.tryNext()
    }
  }
}

主要看_read()方法。
``
所有实现可读流的实例必须实现readable._read() 方法去获得底层的数据资源。

当 readable._read() 被调用,如果读取的数据是可用的,应该在最开始的实现的时候使用this.push(dataChunk)方法将该数据推入读取队列。_read() 应该一直读取资源直到推送数据方法readable.push()返回false的时候停止。想再次调用_read()方法,需要再次往可读流里面push数据。

RenderStream继承stream.Readable,声明_read读取底层数据,在数据流缓冲队列超过max_size(node实现的16384),this.pushBySize;当steam done之后,this.tryRender

tryRender:

const context = new RenderContext({
    activeInstance: component,
    userContext,
    write, done, renderNode,
    isUnaryTag, modules, directives,
    cache
})
installSSRHelpers(component)
normalizeRender(component)
renderNode(component._render(), true, context)

值得借鉴的地方

1.代码组织结构。Vue的代码耦合度还是比较低的,比如核心的部分都在Core中,在Platforms的web和weex中很方便的对其进行扩展;代码组织也比较清晰,基本一个模块只做一件事情,比如compile中就是compile template的,大家看起来一目了然

2.缓存也是用的不错的,基本上可能重复用到的地方都用了缓存。

Vue插件

Vue中如何自定义插件

Vue中绝大本分插件都是通过Vue.use()方法,该方法传入一个对象作为参数,执行对象的Install方法。

Vue.use = function (plugin: Function | Object) {
    /* istanbul ignore if */
    /*标识位检测该插件是否已经被安装*/
    if (plugin.installed) {
      return
    }
    // additional parameters
    const args = toArray(arguments, 1)
    /*a*/
    args.unshift(this)
    if (typeof plugin.install === 'function') {
      /*install执行插件安装*/
      plugin.install.apply(plugin, args)
    } else if (typeof plugin === 'function') {
      plugin.apply(null, args)
    }
    plugin.installed = true
    return this
}

那么我们一般使用插件时,以Vuex为例,直接在实例化Vue时加入,在组件中直接使用this.$store,这又是如何做到的呢?

一般会在install中注册beforeCreate的钩子,在钩子函数中将options或父组件中的方法或属性赋给自组件。利用了父组件create先于子组件的关系,从上到下的进行注册。下面以Vuex为例。

export default function (Vue) {
  /*获取Vue版本,鉴别Vue1.0还是Vue2.0*/
  const version = Number(Vue.version.split('.')[0])

  if (version >= 2) {
    /*通过mixin将vuexInit混淆到Vue实例的beforeCreate钩子中*/
    Vue.mixin({ beforeCreate: vuexInit })
  } else {
    // override init and inject vuex init procedure
    // for 1.x backwards compatibility.
    /*将vuexInit放入_init中调用*/
    const _init = Vue.prototype._init
    Vue.prototype._init = function (options = {}) {
      options.init = options.init
        ? [vuexInit].concat(options.init)
        : vuexInit
      _init.call(this, options)
    }
  }

  /**
   * Vuex init hook, injected into each instances init hooks list.
   */
   /*Vuex的init钩子,会存入每一个Vue实例等钩子列表*/
  function vuexInit () {
    const options = this.$options
    // store injection
    if (options.store) {
      /*存在store其实代表的就是Root节点,直接执行store(function时)或者使用store(非function)*/
      this.$store = typeof options.store === 'function'
        ? options.store()
        : options.store
    } else if (options.parent && options.parent.$store) {
      /*子组件直接从父组件中获取$store,这样就保证了所有组件都公用了全局的同一份store*/
      this.$store = options.parent.$store
    }
  }
}

Vuex

从store的构造函数说起

constructor (options = {}) {
   ......
    this._modules = new ModuleCollection(options)
    /* 根据namespace存放module */
    this._modulesNamespaceMap = Object.create(null)
    /* 存放订阅者 */
    this._subscribers = []
    /* 用以实现Watch的Vue实例 */
    this._watcherVM = new Vue()

    // bind commit and dispatch to self
    /*将dispatch与commit调用的this绑定为store对象本身,否则在组件内部this.dispatch时的this会指向组件的vm*/
    const store = this
    const { dispatch, commit } = this
    /* 为dispatch与commit绑定this(Store实例本身) */
    this.dispatch = function boundDispatch (type, payload) {
      return dispatch.call(store, type, payload)
    }
    this.commit = function boundCommit (type, payload, options) {
      return commit.call(store, type, payload, options)
    }
    
    /*初始化根module,这也同时递归注册了所有子modle,收集所有module的getter到_wrappedGetters中去,this._modules.root代表根module才独有保存的Module对象*/
    installModule(this, state, [], this._modules.root)

    ......
  }

this._modules = new ModuleCollection(options),初始化modules,返回一个Module树,数据结构如下:

  • rootModule
  • _children(k,v对象)
  • _rawModule
  • state
  • namespace(parent/son/xxx/xxx),用path可以用来寻找父State

installModule(this, state, [], this._modules.root),根据上述module树,递归注册mutation,action。。。

/* 遍历注册mutation */
  module.forEachMutation((mutation, key) => {
    const namespacedType = namespace + key
    registerMutation(store, namespacedType, mutation, local)
  })

  /* 遍历注册action */
  module.forEachAction((action, key) => {
    const namespacedType = namespace + key
    registerAction(store, namespacedType, action, local)
  })

  /* 遍历注册getter */
  module.forEachGetter((getter, key) => {
    const namespacedType = namespace + key
    registerGetter(store, namespacedType, getter, local)
  })

  /* 递归安装mudule */
  module.forEachChild((child, key) => {
    installModule(store, rootState, path.concat(key), child, hot)
  })

最终形成Store的数据结构如下:

  • _mutations
    • { key(nameSpace+key): [] }
  • _actions(在里面会执行commit等,所以特意构建了一个LocalContext。里面的type = namespace + type)
    • { key(nameSpace+key): [] }
  • _modules

mutation和action实现上又什么区别呢?可以从下面看出action执行handle,然后判断是否是Promise来决定返回。

/* 遍历注册mutation */
function registerMutation (store, type, handler, local) {
  /* 所有的mutation会被push进一个数组中,这样相同的mutation就可以调用不同module中的同名的mutation了 */
  const entry = store._mutations[type] || (store._mutations[type] = [])
  entry.push(function wrappedMutationHandler (payload) {
    handler.call(store, local.state, payload)
  })
}
/* 遍历注册action */
function registerAction (store, type, handler, local) {
  /* 取出type对应的action */
  const entry = store._actions[type] || (store._actions[type] = [])
  entry.push(function wrappedActionHandler (payload, cb) {
    let res = handler.call(store, {
      dispatch: local.dispatch,
      commit: local.commit,
      getters: local.getters,
      state: local.state,
      rootGetters: store.getters,
      rootState: store.state
    }, payload, cb)
    /* 判断是否是Promise */
    if (!isPromise(res)) {
      /* 不是Promise对象的时候转化称Promise对象 */
      res = Promise.resolve(res)
    }
    if (store._devtoolHook) {
      /* 存在devtool捕获的时候触发vuex的error给devtool */
      return res.catch(err => {
        store._devtoolHook.emit('vuex:error', err)
        throw err
      })
    } else {
      return res
    }
  })
}

state中的数据是怎样加入到Vue的响应体系中的呢?使用Vue.暴露出得$set。

store._vm = new Vue({
    data: {
      $$state: state
    },
    computed
  })

Vue-router

挂载方法与上述类似,只不过多做了router._init及注册组件router-view和router-link。

/* 混淆进Vue实例,在boforeCreate与destroyed钩子上混淆 */
  Vue.mixin({
    /* boforeCreate钩子 */
    beforeCreate () {
      if (isDef(this.$options.router)) {
        /* 在option上面存在router则代表是根组件 */
        /* 保存跟组件vm */
        this._routerRoot = this
        /* 保存router */
        this._router = this.$options.router
        /* VueRouter对象的init方法 */
        this._router.init(this)
        /* Vue内部方法,为对象defineProperty上在变化时通知的属性 */
        Vue.util.defineReactive(this, '_route', this._router.history.current)
      } else {
        /* 非根组件则直接从父组件中获取 */
        this._routerRoot = (this.$parent && this.$parent._routerRoot) || this
      }
      /* 通过registerRouteInstance方法注册router实例 */
      registerInstance(this, this)
    },
    destroyed () {
      registerInstance(this)
    }
  })

  /* 在Vue的prototype上面绑定$router,这样可以在任意Vue对象中使用this.$router访问,同时经过Object.defineProperty,访问this.$router即访问this._routerRoot._router */
  Object.defineProperty(Vue.prototype, '$router', {
    get () { return this._routerRoot._router }
  })

  /* 以上同理,访问this.$route即访问this._routerRoot._route */
  Object.defineProperty(Vue.prototype, '$route', {
    get () { return this._routerRoot._route }
  })

  /* 注册touter-view以及router-link组件 */
  Vue.component('RouterView', View)
  Vue.component('RouterLink', Link)

我们接下来看一下router的构造函数

constructor (options: RouterOptions = {}) {
    this.app = null
    /* 保存vm实例 */
    this.apps = []
    this.options = options
    this.beforeHooks = []
    this.resolveHooks = []
    this.afterHooks = []
    this.matcher = createMatcher(options.routes || [], this)

    let mode = options.mode || 'hash'
    this.fallback = mode === 'history' && !supportsPushState && options.fallback !== false
    if (this.fallback) {
      mode = 'hash'
    }
    if (!inBrowser) {
      mode = 'abstract'
    }
    this.mode = mode

    switch (mode) {
      case 'history':
        this.history = new HTML5History(this, options.base)
        break
      case 'hash':
        this.history = new HashHistory(this, options.base, this.fallback)
        break
      case 'abstract':
        this.history = new AbstractHistory(this, options.base)
        break
      default:
        if (process.env.NODE_ENV !== 'production') {
          assert(false, `invalid mode: ${mode}`)
        }
    }
  }

this.matcher = createMatcher(options.routes || [], this)根据pathList, pathMap, nameMap来找出跟路由匹配的route对象。

PathMap结构如下:

GitHub

this.matcher.match用来查找匹配的路由,返回route对象,主要步骤有标准化路由(normalizeLocation)、从pathMap/pathList/nameMap中取响应记录、返回route对象。

参数
RawLocation,currentRoute, redirectedFrom

1.normalizeLocation(raw, currentRoute, false, router)

步骤:resolvePath, resolveQuery,hadleHash
返回 
return {
    _normalized: true,
    path,
    query,
    hash
}

resolvePath
处理相对路径的逻辑
const segments = relative.replace(/^\//, '').split('/')
for (let i = 0; i < segments.length; i++) {
    const segment = segments[i]
    if (segment === '..') {
        stack.pop()
     } else if (segment !== '.') {
        stack.push(segment)
    }
}

2.如果有name,直接从nameMap中取出record就行;如果有path,则从遍历pathList,取matchRoute

matchRoute(record.regex, location.path, location.params),将正则匹配到的值赋给params

3.createRoute
_createRoute(record, location, redirectedFrom)


const route: Route = {
    name: location.name || (record && record.name),
    meta: (record && record.meta) || {},    
    path: location.path || '/',
    hash: location.hash || '',
    query,
    params: location.params || {},
    fullPath: getFullPath(location, stringifyQuery),
    matched: record ? formatMatch(record) : []
}

router.history,分history/hash/abstract三种,histoy、hash即咱们理解的history和hash,abstract是Vue router自己利用堆栈实现的一套记录路由的方式。大致操作方法如下。

H5:

pushState/replaceState

Hash

window.addEventListener(supportsPushState ? 'popstate' : 'hashchange’

// 拼一个#hash=>pushState/replaceState

Abstract

this.stack = this.stack.slice(0, this.index + 1).concat(route)

router-viewrouter-link为vue-router默认的组件

首先看router-view,router-view组件在render中首先向上遍历到根结点,找到当前router-view的深度,也就是定义router是children的深度;找到对应组件;render()。

* router-view组件 */
export default {
  name: 'RouterView',
  /* 
    https://cn.vuejs.org/v2/api/#functional
    使组件无状态 (没有 data ) 和无实例 (没有 this 上下文)。他们用一个简单的 render 函数返回虚拟节点使他们更容易渲染。
  */
  functional: true,
  props: {
    name: {
      type: String,
      default: 'default'
    }
  },
  render (_, { props, children, parent, data }) {
  
  ......
  
    /* _routerRoot中中存放了根组件的实例,这边循环向上级访问,直到访问到根组件,得到depth深度 */
    while (parent && parent._routerRoot !== parent) {
      if (parent.$vnode && parent.$vnode.data.routerView) {
        depth++
      }
      /* 如果_inactive为true,代表是在keep-alive中且是待用(非alive状态) */
      if (parent._inactive) {
        inactive = true
      }
      parent = parent.$parent
    }
    /* 存放route-view组件的深度 */
    data.routerViewDepth = depth
    
	......
	
    /* 注册实例的registration钩子,这个函数将在实例被注入的加入到组件的生命钩子(beforeCreate与destroyed)中被调用 */
    data.registerRouteInstance = (vm, val) => {  
      /* 第二个值不存在的时候为注销 */
      // val could be undefined for unregistration
      /* 获取组件实例 */
      const current = matched.instances[name]
      if (
        (val && current !== vm) ||
        (!val && current === vm)
      ) {
        /* 这里有两种情况,一种是val存在,则用val替换当前组件实例,另一种则是val不存在,则直接将val(这个时候其实是一个undefined)赋给instances */
        matched.instances[name] = val
      }
    }

   ......

    return h(component, data, children)
  }
}

那么,routerView是如何得知路由变化,触发其render()的呢?这又回到了View的响应式中,Vue中Vm或数据发生变化时,会调用q前文提到的vue.update(vm.render())方法更新操作。

vue-router在初始化时Vue.util.defineReactive(this, '_route', this._router.history.current)将_route变成了响应式,在路由发生变化时,执行updateRoute()将新的route赋给_route。

history.listen(route => {
  this.apps.forEach((app) => {
    app._route = route
  })
})

router-link比较简单。默认a标签,监听click事件,确定是router.push还是router.replace。

路由和组件时怎么对应的呢,路由变化后,组件如何变化呢?

1.找出matched route
	const route = this.router.match(location, this.current)

2.confirmTransition

	A.找出哪些record要删除、保留、添加
	B.confirmTransition
		分别执行这些record中instance下的钩子
		生命周期:beforeRouteLeave =》beforeRouteUpdate =》beforeRouteEnter

3.最后执行路由切换
	if (typeof to === 'object' && to.replace) {
	    this.replace(to)
	} else {
	    this.push(to)
	}

4.confirmTransition的callback,更新app._route
	updateRoute (route: Route) {
	    ...
	    This.callback() => this._route = matchedRoute
	    ...
	}
	onComplete(history子类中调用handle scroll等钩子)

V8源码-内存管理

本文我们将从源码角度来介绍V8引擎的内存管理部分,主要包括内存分配和垃圾回收。

为了聚焦**,本文采用的V8比较低的版本0.1.5,这个版本实现起来比较简单,大家比较容易的看出实现**。

内存分配

V8将内存空间分为几个区域,分别是NewSpace、OldSpace、LargeObjectSpace、MapSpace、CodeSpace,各个space的关系如下图所示:

各个space的作用:

LargeObjectSpace :为了避免大对象的拷贝,使用该空间专门存储大对象(大小超过Normal Page能容纳的对象范围),包括Code、Sequetial String、FixedArray;

MapSpace :存放对象的Map信息,即hidden_class;最大限制为8MB;每个Map对象固定大小,为了快速定位,所以将该空间单独出来;

NewSpace :存放多种类型对象,最大限制为2MB;

CodeSpace :存放预编译代码(?);最大限制为512MB;

Old_Pointer_Space :存放GC后surviving的指针对象;最大限制为512MB;

Old_Data_Space :存放GC后surviving的数据对象;最大限制为512MB;

初始化

首先是内存的初始化,这部分在V8初始化完OS的一些参数之后进行初始化,入口文件在src/heap.cc中。代码如下:

bool Heap::Setup(bool create_heap_objects) {
  // Initialize heap spaces and initial maps and objects. Whenever something
  // goes wrong, just return false. The caller should check the results and
  // call Heap::TearDown() to release allocated memory.
  //
  // If the heap is not yet configured (eg, through the API), configure it.
  // Configuration is based on the flags new-space-size (really the semispace
  // size) and old-space-size if set or the initial values of semispace_size_
  // and old_generation_size_ otherwise.
  if (!heap_configured) {
    if (!ConfigureHeap(FLAG_new_space_size, FLAG_old_space_size)) return false;
  }

  // Setup memory allocator and allocate an initial chunk of memory.  The
  // initial chunk is double the size of the new space to ensure that we can
  // find a pair of semispaces that are contiguous and aligned to their size.
  // 分配堆内存,新生代 + 老生代
  // setup chunks
  // MemoryAllocator为单例
  if (!MemoryAllocator::Setup(MaxCapacity())) return false;
  // 预留2 * young_generation_size_虚拟内存,MemoryAllocator::initial_chunk_
  void* chunk
      = MemoryAllocator::ReserveInitialChunk(2 * young_generation_size_);
  if (chunk == NULL) return false;

  // Put the initial chunk of the old space at the start of the initial
  // chunk, then the two new space semispaces, then the initial chunk of
  // code space.  Align the pair of semispaces to their size, which must be
  // a power of 2.
  ASSERT(IsPowerOf2(young_generation_size_));
  Address old_space_start = reinterpret_cast<Address>(chunk);
  // 方向沿低地址区域
  Address new_space_start = RoundUp(old_space_start, young_generation_size_);
  Address code_space_start = new_space_start + young_generation_size_;
  int old_space_size = new_space_start - old_space_start;
  int code_space_size = young_generation_size_ - old_space_size;

  // Initialize new space.
  new_space_ = new NewSpace(initial_semispace_size_, semispace_size_);
  if (new_space_ == NULL) return false;
  // mmap申请from_space、to_space
  if (!new_space_->Setup(new_space_start, young_generation_size_)) return false;

  // Initialize old space, set the maximum capacity to the old generation
  // size.
  // pagedSpace.setup
  old_space_ = new OldSpace(old_generation_size_, OLD_SPACE);
  if (old_space_ == NULL) return false;
  if (!old_space_->Setup(old_space_start, old_space_size)) return false;

  // Initialize the code space, set its maximum capacity to the old
  // generation size.
  code_space_ = new OldSpace(old_generation_size_, CODE_SPACE);
  if (code_space_ == NULL) return false;
  if (!code_space_->Setup(code_space_start, code_space_size)) return false;

  // Initialize map space.
  map_space_ = new MapSpace(kMaxMapSpaceSize);
  if (map_space_ == NULL) return false;
  // Setting up a paged space without giving it a virtual memory range big
  // enough to hold at least a page will cause it to allocate.
  if (!map_space_->Setup(NULL, 0)) return false;

  lo_space_ = new LargeObjectSpace();
  if (lo_space_ == NULL) return false;
  if (!lo_space_->Setup()) return false;

  if (create_heap_objects) {
    // Create initial maps.
    if (!CreateInitialMaps()) return false;
    if (!CreateApiObjects()) return false;

    // Create initial objects
    if (!CreateInitialObjects()) return false;
  }

  LOG(IntEvent("heap-capacity", Capacity()));
  LOG(IntEvent("heap-available", Available()));

  return true;
}

这里主要做了如下几件事:

1.配置Heap参数,包括young_generation_size_(2MB)和old_generation_size_(512MB),这里老生代基于页的内存管理,old_generation_size_表示老生代的内存页数(每页8KB);

2.MemoryAllocator::Setup,初始化chunk用于管理页,一个chunk拥有64页;

3.预留2 * young_generation_size_ 虚拟内存,地址保存在变量MemoryAllocator::initial_chunk_。注意这里MemoryAllocator是个单例;

4.从上一步分配的虚拟内存开始分配各个内存区域

虚拟内存被分成了new_space_, old_space_, code_space_, map_space_, lo_space_, 各个空间按照下图进行划分:

下面着重给大家讲一下各个内存区域是如何初始化的,初始化的代码在src/spaces.cc中。

NewSpace

bool NewSpace::Setup(Address start, int size) {
  ASSERT(size == 2 * maximum_capacity_);
  ASSERT(IsAddressAligned(start, size, 0));

  if (to_space_ == NULL
      || !to_space_->Setup(start, maximum_capacity_)) {
    return false;
  }
  if (from_space_ == NULL
      || !from_space_->Setup(start + maximum_capacity_, maximum_capacity_)) {
    return false;
  }

  start_ = start;
  address_mask_ = ~(size - 1);
  object_mask_ = address_mask_ | kHeapObjectTag;
  object_expected_ = reinterpret_cast<uint32_t>(start) | kHeapObjectTag;

  allocation_info_.top = to_space_->low();
  allocation_info_.limit = to_space_->high();
  mc_forwarding_info_.top = NULL;
  mc_forwarding_info_.limit = NULL;

  ASSERT_SEMISPACE_ALLOCATION_INFO(allocation_info_, to_space_);
  return true;
}

这里主要初始化了to_space_和from_space_,to_space_和from_space_的类型是SemiSpace,初始化代码如下:

bool SemiSpace::Setup(Address start, int size) {
  ASSERT(size == maximum_capacity_);
  if (!MemoryAllocator::CommitBlock(start, capacity_)) return false;

  start_ = start;
  address_mask_ = ~(size - 1);
  object_mask_ = address_mask_ | kHeapObjectTag;
  object_expected_ = reinterpret_cast<uint32_t>(start) | kHeapObjectTag;

  age_mark_ = start_;
  return true;
}

bool MemoryAllocator::CommitBlock(Address start, size_t size) {
  ASSERT(start != NULL);
  ASSERT(size > 0);
  ASSERT(initial_chunk_ != NULL);
  ASSERT(initial_chunk_->address() <= start);
  ASSERT(start + size <= reinterpret_cast<Address>(initial_chunk_->address())
                             + initial_chunk_->size());

  // mmap
  if (!initial_chunk_->Commit(start, size)) return false;
  Counters::memory_allocated.Increment(size);
  return true;
}

这里主要通过MemoryAllocator::CommitBlock去申请了预留的虚拟内存中的区域,initial_chunk_->Commit实际调用的是VirtualMemory::Commit,代码如下:

bool VirtualMemory::Commit(void* address, size_t size) {
  if (MAP_FAILED == mmap(address, size, PROT_READ | PROT_WRITE | PROT_EXEC,
                         MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
                         kMmapFd, kMmapFdOffset)) {
    return false;
  }

  UpdateAllocatedSpaceLimits(address, size);
  return true;
}

起始就是通过mmap开辟了一块虚拟内存,至于mmap和malloc的关系,大家可以参考Linux内存分配小结--malloc、brk、mmap

OldSpace

OldSpace继承自Pagedspace,old_space_->Setup实际调用的是基类Pagedspace的setup方法,代码如下:

bool PagedSpace::Setup(Address start, size_t size) {
  if (HasBeenSetup()) return false;

  int num_pages = 0;
  // Try to use the virtual memory range passed to us.  If it is too small to
  // contain at least one page, ignore it and allocate instead.
  // 如果在预留的虚拟内存里
  if (PagesInChunk(start, size) > 0) {
    first_page_ = MemoryAllocator::CommitPages(start, size, this, &num_pages);
  } else {
    // 申请多少页
    int requested_pages = Min(MemoryAllocator::kPagesPerChunk,
                              max_capacity_ / Page::kObjectAreaSize);
    first_page_ =
        MemoryAllocator::AllocatePages(requested_pages, &num_pages, this);
    if (!first_page_->is_valid()) return false;
  }

  // We are sure that the first page is valid and that we have at least one
  // page.
  ASSERT(first_page_->is_valid());
  ASSERT(num_pages > 0);
  accounting_stats_.ExpandSpace(num_pages * Page::kObjectAreaSize);
  ASSERT(Capacity() <= max_capacity_);

  for (Page* p = first_page_; p->is_valid(); p = p->next_page()) {
    // 用于Mack-compact内存回收
    p->ClearRSet();
  }

  // Use first_page_ for allocation.
  SetAllocationInfo(&allocation_info_, first_page_);

  return true;
}

这里做了如下几件事:

1.判断预留的虚拟内存里是否可以容纳
	a.可以容纳,MemoryAllocator::CommitPages,直接在预留虚拟内存中分配
	b.空间不足,MemoryAllocator::AllocatePages申请虚拟内存
2.遍历所有页,标记,用于后面垃圾回收

下面主要介绍下MemoryAllocator::CommitPagesMemoryAllocator::AllocatePages

MemoryAllocator::CommitPages在预留的虚拟内存里可以容纳OLD_SPACE时调用,代码如下:

Page* MemoryAllocator::CommitPages(Address start, size_t size,
                                   PagedSpace* owner, int* num_pages) {
  ASSERT(start != NULL);
  *num_pages = PagesInChunk(start, size);
  ASSERT(*num_pages > 0);
  ASSERT(initial_chunk_ != NULL);
  ASSERT(initial_chunk_->address() <= start);
  ASSERT(start + size <= reinterpret_cast<Address>(initial_chunk_->address())
                             + initial_chunk_->size());

  if (!initial_chunk_->Commit(start, size)) {
    return Page::FromAddress(NULL);
  }
  Counters::memory_allocated.Increment(size);

  // So long as we correctly overestimated the number of chunks we should not
  // run out of chunk ids.
  CHECK(!OutOfChunkIds());
  int chunk_id = Pop();
  chunks_[chunk_id].init(start, size, owner);
  return InitializePagesInChunk(chunk_id, *num_pages, owner);
}

这里主要做了两件事:

1.申请预留的虚拟内存,initial_chunk_->Commit,也就是前面介绍过的VirtualMemory::Commit
2.初始化chunks_,这里强调下chunks_是一个chunkinfo类型的数组,里面存储着每个chunk的信息。

MemoryAllocator::AllocatePages在预留的虚拟内存里不足以容纳OLD_SPACE时调用,代码如下:

Page* MemoryAllocator::AllocatePages(int requested_pages, int* allocated_pages,
                                     PagedSpace* owner) {
  if (requested_pages <= 0) return Page::FromAddress(NULL);
  size_t chunk_size = requested_pages * Page::kPageSize;

  // There is not enough space to guarantee the desired number pages can be
  // allocated.
  // 没有足够空间,那就有多大申请多大
  if (size_ + static_cast<int>(chunk_size) > capacity_) {
    // Request as many pages as we can.
    chunk_size = capacity_ - size_;
    requested_pages = chunk_size >> Page::kPageSizeBits;

    if (requested_pages <= 0) return Page::FromAddress(NULL);
  }

  void* chunk = AllocateRawMemory(chunk_size, &chunk_size);
  if (chunk == NULL) return Page::FromAddress(NULL);
  LOG(NewEvent("PagedChunk", chunk, chunk_size));

  *allocated_pages = PagesInChunk(static_cast<Address>(chunk), chunk_size);
  // 不够一页的化,申请无效,释放虚拟内存munmap
  if (*allocated_pages == 0) {
    FreeRawMemory(chunk, chunk_size);
    LOG(DeleteEvent("PagedChunk", chunk));
    return Page::FromAddress(NULL);
  }

  // 初始化新的chunk
  int chunk_id = Pop();
  chunks_[chunk_id].init(static_cast<Address>(chunk), chunk_size, owner);

  return InitializePagesInChunk(chunk_id, *allocated_pages, owner);
}

这里主要做了如下几件事:

1.如果超出了最大的capacity_,算出最多还能申请多少,作为申请的size
2.申请内存MemoryAllocator::AllocateRawMemory
3.判断申请的内存空间是否够一页(8KB),不够一页的化,申请无效,释放虚拟内存munmap
4.初始化新的chunk,这一步与MemoryAllocator::AllocatePages一致

下面我们来看下MemoryAllocator::AllocateRawMemory:

void* MemoryAllocator::AllocateRawMemory(const size_t requested,
                                         size_t* allocated) {
  if (size_ + static_cast<int>(requested) > capacity_) return NULL;

  // mmap & UpdateAllocatedSpaceLimits
  void* mem = OS::Allocate(requested, allocated);
  int alloced = *allocated;
  size_ += alloced;
  Counters::memory_allocated.Increment(alloced);
  return mem;
}

调用的OS::Allocate代码如下:

void* OS::Allocate(const size_t requested, size_t* allocated) {
  const size_t msize = RoundUp(requested, getpagesize());
  void* mbase = mmap(NULL, msize, PROT_READ | PROT_WRITE | PROT_EXEC,
                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  if (mbase == MAP_FAILED) {
    LOG(StringEvent("OS::Allocate", "mmap failed"));
    return NULL;
  }
  *allocated = msize;
  UpdateAllocatedSpaceLimits(mbase, msize);
  return mbase;
}

这里也是用的mmap,但是第一个参数也就是起始地址为null,这个参数代表要映射到的内存区域的起始地址,这也是跟刚刚使用预留的虚拟内存不同的地方。

这里多说一句,使用预留的虚拟内存有助于增加读写效率,主要因为预留的申请后,不需要修改物理地址和逻辑地址的映射关系,也就是进程的页表。

其他space

其他space与oldspace类似,都是PagedSpace的子类,这里不再赘述。

分配

在具体讲解内存分配之前,我们先讲解几个概念:

1.page,pagedSpace中,内存以page为单位,一个对象不能跨page存储(page的大小与内存页大小一致)
2.扩展堆内存时,以chunk为单位,一个chunk最多包含64个page,这样做可以减少mmap系统调用次数,有利于提高效率;
3.freelist将每一页中的内部碎片收集起来,这里很像操作系统的内存管理

内存分配的入口在src/heap-inl.h中的Heap::AllocateRaw方法,代码如下:

// 内存分配入口
Object* Heap::AllocateRaw(int size_in_bytes, AllocationSpace space) {
  ASSERT(allocation_allowed_ && gc_state_ == NOT_IN_GC);
#ifdef DEBUG
  if (FLAG_gc_interval >= 0 &&
      !disallow_allocation_failure_ &&
      Heap::allocation_timeout_-- <= 0) {
    return Failure::RetryAfterGC(size_in_bytes, space);
  }
  Counters::objs_since_last_full.Increment();
  Counters::objs_since_last_young.Increment();
#endif
  if (NEW_SPACE == space) {
    return new_space_->AllocateRaw(size_in_bytes);
  }

  Object* result;
  if (OLD_SPACE == space) {
    result = old_space_->AllocateRaw(size_in_bytes);
  } else if (CODE_SPACE == space) {
    result = code_space_->AllocateRaw(size_in_bytes);
  } else if (LO_SPACE == space) {
    result = lo_space_->AllocateRaw(size_in_bytes);
  } else {
    ASSERT(MAP_SPACE == space);
    result = map_space_->AllocateRaw(size_in_bytes);
  }
  if (result->IsFailure()) old_gen_exhausted_ = true;
  return result;
}

这里主要根据不同的空间类型,调用其AllocateRaw方法进行内存分配。

OldSpace

这里首先提及一下OldSpace继承与PagedSpace,所以内存管理是基于页的。

AllocateRaw代码如下:

// Allocates requested bytes. May return Failure if the space is full.
  Object* AllocateRaw(int size_in_bytes) {
    ASSERT_OBJECT_SIZE(size_in_bytes);
    return AllocateRawInternal(size_in_bytes, &allocation_info_);
}

AllocateRawInternal代码如下:

Object* OldSpace::AllocateRawInternal(int size_in_bytes,
                                      AllocationInfo* alloc_info) {
  ASSERT(HasBeenSetup());

  if (allocation_mode_ == LINEAR_ONLY || allocation_mode_ == LINEAR) {
    // Try linear allocation in the current page.
    Address cur_top = alloc_info->top;
    Address new_top = cur_top + size_in_bytes;
    if (new_top <= alloc_info->limit) {
      Object* obj = HeapObject::FromAddress(cur_top);
      alloc_info->top = new_top;
      ASSERT_PAGED_ALLOCATION_INFO(*alloc_info);

      accounting_stats_.AllocateBytes(size_in_bytes);
      ASSERT(Size() <= Capacity());
      return obj;
    }
  } else {
    // For now we should not try free list allocation during m-c relocation.
    // 从free_list中申请
    ASSERT(alloc_info == &allocation_info_);
    int wasted_bytes;
    Object* object = free_list_.Allocate(size_in_bytes, &wasted_bytes);
    accounting_stats_.WasteBytes(wasted_bytes);
    if (!object->IsFailure()) {
      accounting_stats_.AllocateBytes(size_in_bytes);
      return object;
    }
  }
  // Fast allocation failed.
  return SlowAllocateRaw(size_in_bytes, alloc_info);
}

这里主要做了如下的事情:

1.判断分配方式是否为线型
	a.如果是,判断当前页剩余空间是否足够分配,
		i.如果空间足够,如果是则划分该区域,将page->top向前移动
		ii.当前页空间不足,跳转至步骤二(进行SlowAllocateRaw)
	b.如果不是,则从free_list_中分配空间,如果分配不成功,跳转至步骤二
2.步骤一的快速分配失败,执行SlowAllocateRaw

SlowAllocateRaw代码如下:

// Slow cases for AllocateRawInternal.  In linear allocation mode, try
// to allocate in the next page in the space.  If there are no more
// pages, switch to free-list allocation if permitted, otherwise try
// to grow the space.  In free-list allocation mode, try to grow the
// space and switch to linear allocation.
Object* OldSpace::SlowAllocateRaw(int size_in_bytes,
                                  AllocationInfo* alloc_info) {
  if (allocation_mode_ == LINEAR_ONLY || allocation_mode_ == LINEAR) {
    // 最后一页,对内存由低地址向高地址
    Page* top_page = TopPageOf(*alloc_info);
    // Until we implement free-list allocation during global gc, we have two
    // cases: one for normal allocation and one for m-c relocation allocation.
    // first_page
    if (alloc_info == &allocation_info_) {  // Normal allocation.
      // 最后一页还剩多少size
      int free_size = top_page->ObjectAreaEnd() - alloc_info->top;
      // Add the extra space at the top of this page to the free list.
      // 直接挪top
      if (free_size > 0) {
        int wasted_bytes = free_list_.Free(alloc_info->top, free_size);
        accounting_stats_.WasteBytes(wasted_bytes);
        alloc_info->top += free_size;
        ASSERT_PAGED_ALLOCATION_INFO(*alloc_info);
      }

      // Move to the next page in this space if there is one; switch
      // to free-list allocation, if we can; try to expand the space otherwise
      // 挪到下一页
      if (top_page->next_page()->is_valid()) {
        SetAllocationInfo(alloc_info, top_page->next_page());
      }
      // allocation_mode_设置成FREE_LIST,从FREE_LIST里分配内存
      else if (allocation_mode_ == LINEAR) {
        allocation_mode_ = FREE_LIST;
      }
      // expand a chunk
      else if (Expand(top_page)) {
        ASSERT(top_page->next_page()->is_valid());
        SetAllocationInfo(alloc_info, top_page->next_page());
      }
      // 回收内存垃圾并重试
      else {
        return Failure::RetryAfterGC(size_in_bytes, identity());
      }
    } else {  // Allocation during m-c relocation.
      // During m-c 'allocation' while computing forwarding addresses, we do
      // not yet add blocks to the free list because they still contain live
      // objects.  We also cache the m-c forwarding allocation pointer in the
      // current page.

      // If there are no more pages try to expand the space.  This can only
      // happen when promoting objects from the new space.
      if (!top_page->next_page()->is_valid()) {
        if (!Expand(top_page)) {
          return Failure::RetryAfterGC(size_in_bytes, identity());
        }
      }

      // Move to the next page.
      ASSERT(top_page->next_page()->is_valid());
      top_page->mc_relocation_top = alloc_info->top;
      SetAllocationInfo(alloc_info, top_page->next_page());
    }
  } else {  // Free-list allocation.
    // We failed to allocate from the free list; try to expand the space and
    // switch back to linear allocation.
    ASSERT(alloc_info == &allocation_info_);
    Page* top_page = TopPageOf(*alloc_info);
    if (!top_page->next_page()->is_valid()) {
      if (!Expand(top_page)) {
        return Failure::RetryAfterGC(size_in_bytes, identity());
      }
    }

    // We surely have more pages, move to the next page and switch to linear
    // allocation.
    ASSERT(top_page->next_page()->is_valid());
    SetAllocationInfo(alloc_info, top_page->next_page());
    ASSERT(allocation_mode_ == FREE_LIST);
    allocation_mode_ = LINEAR;
  }

  // Perform the allocation.
  return AllocateRawInternal(size_in_bytes, alloc_info);
}

这里主要做了如下几件事:

1.利用TopPageOf获取到已分配内存的最后一页top_page
2.拿到最后一页剩余的size
3.如果剩余size大于0,直接将剩余空间给free_list_,同时移动page->top到该页的末尾
4.分配内存空间,这里有四种选择
	a.如果top_page->next_page()有效,也就是当前的top_page下一页有效,那么直接分配,跳转到步骤5,执行AllocateRawInternal
	b.分配方式是线性,则将分配方式变为FREE_LIST,跳转到步骤5,也就是后面会从free_list_中分配内存
	c.重新分配一个chunk(有最大空间限制,在最大空间以内,最多分配64页),跳转到步骤5
	d.上述都没成功,则执行RetryAfterGC,垃圾回收后重试
5.AllocateRawInternal

下面重点讲下Expand,其实调用的是基类PagedSpace的Expand方法,代码如下:

bool PagedSpace::Expand(Page* last_page) {
  ASSERT(max_capacity_ % Page::kObjectAreaSize == 0);
  ASSERT(Capacity() % Page::kObjectAreaSize == 0);

  if (Capacity() == max_capacity_) return false;

  ASSERT(Capacity() < max_capacity_);
  // Last page must be valid and its next page is invalid.
  ASSERT(last_page->is_valid() && !last_page->next_page()->is_valid());

  // 可用的页数
  int available_pages = (max_capacity_ - Capacity()) / Page::kObjectAreaSize;
  if (available_pages <= 0) return false;

  // 最大一个chunk
  int desired_pages = Min(available_pages, MemoryAllocator::kPagesPerChunk);
  Page* p = MemoryAllocator::AllocatePages(desired_pages, &desired_pages, this);
  if (!p->is_valid()) return false;

  accounting_stats_.ExpandSpace(desired_pages * Page::kObjectAreaSize);
  ASSERT(Capacity() <= max_capacity_);

  MemoryAllocator::SetNextPage(last_page, p);

  // Clear remembered set of new pages.
  while (p->is_valid()) {
    p->ClearRSet();
    p = p->next_page();
  }

  return true;
}

这里做了如下几件事:

1.获取还可以分配的最多页数available_pages
2.在available_pages和kPagesPerChunk(64)中取最小值,调用MemoryAllocator::AllocatePages分配,MemoryAllocator::AllocatePages上述已经讲解过,用于分配一个chunk并初始化其中的page
3.整个内存page链表,也就是将新分配的内存接到top_page后面

下面我们总结下oldSpace内存分配的流程图:

NewSpace

NewSpace::AllocateRawInternal代码如下:

Object* NewSpace::AllocateRawInternal(int size_in_bytes,
                                      AllocationInfo* alloc_info) {
  Address new_top = alloc_info->top + size_in_bytes;
  if (new_top > alloc_info->limit) {
    return Failure::RetryAfterGC(size_in_bytes, NEW_SPACE);
  }

  Object* obj = HeapObject::FromAddress(alloc_info->top);
  alloc_info->top = new_top;
#ifdef DEBUG
  SemiSpace* space =
      (alloc_info == &allocation_info_) ? to_space_ : from_space_;
  ASSERT(space->low() <= alloc_info->top
         && alloc_info->top <= space->high()
         && alloc_info->limit == space->high());
#endif
  return obj;
}

这里没有page的概念,直接移动top指针就好,空间不足则直接RetryAfterGC

LargeObjectSpace

LargeObjectSpace::AllocateRawInternal代码如下:

Object* LargeObjectSpace::AllocateRawInternal(int requested_size,
                                              int object_size) {
  ASSERT(0 < object_size && object_size <= requested_size);
  size_t chunk_size;
  LargeObjectChunk* chunk =
      LargeObjectChunk::New(requested_size, &chunk_size);
  if (chunk == NULL) {
    return Failure::RetryAfterGC(requested_size, LO_SPACE);
  }

  size_ += chunk_size;
  page_count_++;
  chunk->set_next(first_chunk_);
  chunk->set_size(chunk_size);
  first_chunk_ = chunk;

  // Set the object address and size in the page header and clear its
  // remembered set.
  Page* page = Page::FromAddress(RoundUp(chunk->address(), Page::kPageSize));
  Address object_address = page->ObjectAreaStart();
  // Clear the low order bit of the second word in the page to flag it as a
  // large object page.  If the chunk_size happened to be written there, its
  // low order bit should already be clear.
  ASSERT((chunk_size & 0x1) == 0);
  page->is_normal_page &= ~0x1;
  page->ClearRSet();
  int extra_bytes = requested_size - object_size;
  if (extra_bytes > 0) {
    // The extra memory for the remembered set should be cleared.
    memset(object_address + object_size, 0, extra_bytes);
  }

  return HeapObject::FromAddress(object_address);
}

LargeObjectChunk* LargeObjectChunk::New(int size_in_bytes,
                                        size_t* chunk_size) {
  size_t requested = ChunkSizeFor(size_in_bytes);
  void* mem = MemoryAllocator::AllocateRawMemory(requested, chunk_size);
  if (mem == NULL) return NULL;
  LOG(NewEvent("LargeObjectChunk", mem, *chunk_size));
  if (*chunk_size < requested) {
    MemoryAllocator::FreeRawMemory(mem, *chunk_size);
    LOG(DeleteEvent("LargeObjectChunk", mem));
    return NULL;
  }
  return reinterpret_cast<LargeObjectChunk*>(mem);
}

由于该空间中每个Page都只会存放一个对象,所以当申请内存块时,直接通过MemoryAllocator::AllocateRawMemory分出一块对象大小的内存,并加入到该空间的内存块管理链表中就可以了。

内存析构

v8实例销毁时会调用V8::TearDown,其中会调用Heap::TearDown,Heap::TearDown代码如下:

oid Heap::TearDown() {
  GlobalHandles::TearDown();

  if (new_space_ != NULL) {
    new_space_->TearDown();
    delete new_space_;
    new_space_ = NULL;
  }

  if (old_space_ != NULL) {
    old_space_->TearDown();
    delete old_space_;
    old_space_ = NULL;
  }

  if (code_space_ != NULL) {
    code_space_->TearDown();
    delete code_space_;
    code_space_ = NULL;
  }

  if (map_space_ != NULL) {
    map_space_->TearDown();
    delete map_space_;
    map_space_ = NULL;
  }

  if (lo_space_ != NULL) {
    lo_space_->TearDown();
    delete lo_space_;
    lo_space_ = NULL;
  }

  MemoryAllocator::TearDown();
}

其实就是把各自空间free掉。

垃圾回收

当对象申请内存空间失败,就会调用Failure::RetryAfterGC,这时会开始进行内存清理。垃圾回收的入口在src/heap-inl.h中,代码如下:

// Do not use the identifier __object__ in a call to this macro.
//
// Call the function FUNCTION_CALL.  If it fails with a RetryAfterGC
// failure, call the garbage collector and retry the function.  If the
// garbage collector cannot reclaim the required space or the second
// call fails with a RetryAfterGC failure, fail with out of memory.
// If there is any other failure, return a null handle.  If either
// call succeeds, return a handle to the functions return value.
//
// Note that this macro always returns or raises a fatal error.
#define CALL_HEAP_FUNCTION(FUNCTION_CALL, TYPE)                              \
  do {                                                                       \
    GC_GREEDY_CHECK();                                                       \
    Object* __object__ = FUNCTION_CALL;                                      \
    if (__object__->IsFailure()) {                                           \
      if (__object__->IsRetryAfterGC()) {                                    \
        if (!Heap::CollectGarbage(                                           \
                Failure::cast(__object__)->requested(),                      \
                Failure::cast(__object__)->allocation_space())) {            \
          /* TODO(1181417): Fix this. */                                     \
          v8::internal::V8::FatalProcessOutOfMemory("CALL_HEAP_FUNCTION");   \
        }                                                                    \
        __object__ = FUNCTION_CALL;                                          \
        if (__object__->IsFailure()) {                                       \
          if (__object__->IsRetryAfterGC()) {                                \
            /* TODO(1181417): Fix this. */                                   \
            v8::internal::V8::FatalProcessOutOfMemory("CALL_HEAP_FUNCTION"); \
          }                                                                  \
          return Handle<TYPE>();                                             \
        }                                                                    \
      } else {                                                               \
        return Handle<TYPE>();                                               \
      }                                                                      \
    }                                                                        \
    return Handle<TYPE>(TYPE::cast(__object__));                             \
  } while (false)

其中调用了Heap::CollectGarbage,代码如下:

bool Heap::CollectGarbage(int requested_size, AllocationSpace space) {
  // The VM is in the GC state until exiting this function.
  VMState state(GC);

#ifdef DEBUG
  // Reset the allocation timeout to the GC interval, but make sure to
  // allow at least a few allocations after a collection. The reason
  // for this is that we have a lot of allocation sequences and we
  // assume that a garbage collection will allow the subsequent
  // allocation attempts to go through.
  allocation_timeout_ = Max(6, FLAG_gc_interval);
#endif

  { GCTracer tracer;
    GarbageCollectionPrologue();

    GarbageCollector collector = SelectGarbageCollector(space);
    tracer.set_collector(collector);

    StatsRate* rate = (collector == SCAVENGER)
        ? &Counters::gc_scavenger
        : &Counters::gc_compactor;
    rate->Start();
    PerformGarbageCollection(space, collector);
    rate->Stop();

    GarbageCollectionEpilogue();
  }


#ifdef ENABLE_LOGGING_AND_PROFILING
  if (FLAG_log_gc) HeapProfiler::WriteSample();
#endif

  switch (space) {
    case NEW_SPACE:
      return new_space_->Available() >= requested_size;
    case OLD_SPACE:
      return old_space_->Available() >= requested_size;
    case CODE_SPACE:
      return code_space_->Available() >= requested_size;
    case MAP_SPACE:
      return map_space_->Available() >= requested_size;
    case LO_SPACE:
      return lo_space_->Available() >= requested_size;
  }
  return false;
}

这里主要做了两件事:

1.SelectGarbageCollector,选择垃圾回收器
2.PerformGarbageCollection,执行垃圾回收

Heap::SelectGarbageCollector选择垃圾回收器的代码如下:

GarbageCollector Heap::SelectGarbageCollector(AllocationSpace space) {
  // Is global GC requested?
  if (space != NEW_SPACE || FLAG_gc_global) {
    Counters::gc_compactor_caused_by_request.Increment();
    return MARK_COMPACTOR;
  }

  // Is enough data promoted to justify a global GC?
  if (PromotedSpaceSize() > promoted_space_limit_) {
    Counters::gc_compactor_caused_by_promoted_data.Increment();
    return MARK_COMPACTOR;
  }

  // Have allocation in OLD and LO failed?
  if (old_gen_exhausted_) {
    Counters::gc_compactor_caused_by_oldspace_exhaustion.Increment();
    return MARK_COMPACTOR;
  }

  // Is there enough space left in OLD to guarantee that a scavenge can
  // succeed?
  //
  // Note that old_space_->MaxAvailable() undercounts the memory available
  // for object promotion. It counts only the bytes that the memory
  // allocator has not yet allocated from the OS and assigned to any space,
  // and does not count available bytes already in the old space or code
  // space.  Undercounting is safe---we may get an unrequested full GC when
  // a scavenge would have succeeded.
  if (old_space_->MaxAvailable() <= new_space_->Size()) {
    Counters::gc_compactor_caused_by_oldspace_exhaustion.Increment();
    return MARK_COMPACTOR;
  }

  // Default
  return SCAVENGER;
}

垃圾回收算法主要有MarkCompact和Scavenge两种,这里有四种情况会返回MARK_COMPACTOR垃圾回收器,其余情况会返回SCAVENGER。四种情况分别是:

1.space不是NEW_SPACE或者是全局GC
2.提升空间(PromotedSpaceSize)的剩余空间大于提升空间的最大限制
3.之前在old_space_或lo_space_中分配失败
4.old_space_剩余空间小于new_space_的空间

Heap::PerformGarbageCollection执行垃圾回收的代码如下:

void Heap::PerformGarbageCollection(AllocationSpace space,
                                    GarbageCollector collector) {
  if (collector == MARK_COMPACTOR && global_gc_prologue_callback_) {
    ASSERT(!allocation_allowed_);
    global_gc_prologue_callback_();
  }

  if (collector == MARK_COMPACTOR) {
    MarkCompact();

    int promoted_space_size = PromotedSpaceSize();
    promoted_space_limit_ =
        promoted_space_size + Max(2 * MB, (promoted_space_size/100) * 35);
    old_gen_exhausted_ = false;

    // If we have used the mark-compact collector to collect the new
    // space, and it has not compacted the new space, we force a
    // separate scavenge collection.  THIS IS A HACK.  It covers the
    // case where (1) a new space collection was requested, (2) the
    // collector selection policy selected the mark-compact collector,
    // and (3) the mark-compact collector policy selected not to
    // compact the new space.  In that case, there is no more (usable)
    // free space in the new space after the collection compared to
    // before.
    if (space == NEW_SPACE && !MarkCompactCollector::HasCompacted()) {
      Scavenge();
    }
  } else {
    Scavenge();
  }
  Counters::objs_since_last_young.Set(0);

  // Process weak handles post gc.
  GlobalHandles::PostGarbageCollectionProcessing();

  if (collector == MARK_COMPACTOR && global_gc_epilogue_callback_) {
    ASSERT(!allocation_allowed_);
    global_gc_epilogue_callback_();
  }
}

这里主要根据选出的collector,在不同的space中执行不同的垃圾回收算法(MarkCompact或Scavenge)。

Scavenge

下面我们来看一下这两种垃圾回收算法:

新生代使用Scavenge算法进行回收。在Scavenge算法的实现中,主要采用了Cheney算法。

Cheney算法算法是一种采用复制的方式实现的垃圾回收算法。它将内存一分为二,每一部分空间称为semispace。在这两个semispace中,一个处于使用状态,另一个处于闲置状态。处于使用状态的semispace空间称为From空间,处于闲置状态的空间称为To空间,当我们分配对象时,先是在From空间中进行分配。当开始进行垃圾回收算法时,会检查From空间中的存活对象,这些存活对象将会被复制到To空间中(复制完成后会进行紧缩),而非活跃对象占用的空间将会被释放。完成复制后,From空间和To空间的角色发生对换。也就是说,在垃圾回收的过程中,就是通过将存活对象在两个semispace之间进行复制。可以很容易看出来,使用Cheney算法时,总有一半的内存是空的。但是由于新生代很小,所以浪费的内存空间并不大。而且由于新生代中的对象绝大部分都是非活跃对象,需要复制的活跃对象比例很小,所以其时间效率十分理想。复制的过程采用的是BFS(广度优先遍历)的**,从根对象出发,广度优先遍历所有能到达的对象

需要注意的是,v8中的from_space_和to_space_与算法中描述的正好相反,对象在to_space_中分配,from_space_作为复制的目标空间。

下面是Heap::Scavenge的代码:

void Heap::Scavenge() {
#ifdef DEBUG
  if (FLAG_enable_slow_asserts) {
    VerifyCodeSpacePointersVisitor v;
    HeapObjectIterator it(code_space_);
    while (it.has_next()) {
      HeapObject* object = it.next();
      if (object->IsCode()) {
        Code::cast(object)->ConvertICTargetsFromAddressToObject();
      }
      object->Iterate(&v);
      if (object->IsCode()) {
        Code::cast(object)->ConvertICTargetsFromObjectToAddress();
      }
    }
  }
#endif

  gc_state_ = SCAVENGE;

  // Implements Cheney's copying algorithm
  LOG(ResourceEvent("scavenge", "begin"));


  // 为了避免newspace由于空间过小儿引起频繁地scavenge,于是在每次scavenge之前检查次数,如果超过限制次数(初始为8)且newspace能满足空间翻倍(初始为256KB,最大为2MB),则double空间以及该次数限制。这里的策略调整可以根据实际优化;
  scavenge_count_++;
  if (new_space_->Capacity() < new_space_->MaximumCapacity() &&
      scavenge_count_ > new_space_growth_limit_) {
    // Double the size of the new space, and double the limit.  The next
    // doubling attempt will occur after the current new_space_growth_limit_
    // more collections.
    // TODO(1240712): NewSpace::Double has a return value which is
    // ignored here.
    new_space_->Double();
    new_space_growth_limit_ *= 2;
  }

  // Flip the semispaces.  After flipping, to space is empty, from space has
  // live objects.
  // 交换from_space和to_space
  // 两个semispace的信息互换
  new_space_->Flip();
  // 重置allocation_info_
  new_space_->ResetAllocationInfo();

  // We need to sweep newly copied objects which can be in either the to space
  // or the old space.  For to space objects, we use a mark.  Newly copied
  // objects lie between the mark and the allocation top.  For objects
  // promoted to old space, we write their addresses downward from the top of
  // the new space.  Sweeping newly promoted objects requires an allocation
  // pointer and a mark.  Note that the allocation pointer 'top' actually
  // moves downward from the high address in the to space.
  //
  // There is guaranteed to be enough room at the top of the to space for the
  // addresses of promoted objects: every object promoted frees up its size in
  // bytes from the top of the new space, and objects are at least one pointer
  // in size.  Using the new space to record promoted addresses makes the
  // scavenge collector agnostic to the allocation strategy (eg, linear or
  // free-list) used in old space.
  // promoted object的指针在new_space中从后向前记录
  // to_space_->low()
  Address new_mark = new_space_->ToSpaceLow();
  Address promoted_mark = new_space_->ToSpaceHigh();
  promoted_top = new_space_->ToSpaceHigh();

  CopyVisitor copy_visitor;
  // Copy roots.
  IterateRoots(&copy_visitor);

  // Copy objects reachable from the old generation.  By definition, there
  // are no intergenerational pointers in code space.
  IterateRSet(old_space_, &CopyObject);
  IterateRSet(map_space_, &CopyObject);
  lo_space_->IterateRSet(&CopyObject);

  bool has_processed_weak_pointers = false;

  while (true) {
    ASSERT(new_mark <= new_space_->top());
    ASSERT(promoted_mark >= promoted_top);

    // Copy objects reachable from newly copied objects.
    // 广度优先遍历
    // 相等的时候停止
    // allocation_info_.top
    while (new_mark < new_space_->top() || promoted_mark > promoted_top) {
      // Sweep newly copied objects in the to space.  The allocation pointer
      // can change during sweeping.
      Address previous_top = new_space_->top();
      SemiSpaceIterator new_it(new_space_, new_mark);
      while (new_it.has_next()) {
        new_it.next()->Iterate(&copy_visitor);
      }
      new_mark = previous_top;

      // Sweep newly copied objects in the old space.  The promotion 'top'
      // pointer could change during sweeping.
      previous_top = promoted_top;
      for (Address current = promoted_mark - kPointerSize;
           current >= previous_top;
           current -= kPointerSize) {
        HeapObject* object = HeapObject::cast(Memory::Object_at(current));
        object->Iterate(&copy_visitor);
        UpdateRSet(object);
      }
      promoted_mark = previous_top;
    }

    if (has_processed_weak_pointers) break;  // We are done.
    // Copy objects reachable from weak pointers.
    GlobalHandles::IterateWeakRoots(&copy_visitor);
    has_processed_weak_pointers = true;
  }

  // Set age mark.
  new_space_->set_age_mark(new_mark);

  LOG(ResourceEvent("scavenge", "end"));

  gc_state_ = NOT_IN_GC;
}

这里其实就是依据上述的算法**来执行相应逻辑,主要做了如下几件事:

1.为了避免newspace由于空间过小引起频繁地scavenge,每次scavenge之前检查已经scavenge的次数,如果超过限制次数(初始为8)且newspace能满足空间翻倍(初始为256KB,最大为2MB),则double空间以及该次数限制。
2.交换from_space和to_space,两个semispace的信息互换
3.重置allocation_info_
4.IterateRoots,拷贝from_space(交换前的to_space)中的root对象(包括strong_root_list、struct_map、symbol、bootstrapper、top、debug、compilation cache、handlescope、builtins、globalhandles、threadmanager等)到to_space
5.在to_space中广度优先遍历各个节点,对to_space中引用的其他对象执行复制操作
6.最后再对global handle list中处于weak或pengding状态的对象进行拷贝

下面我们挑几个重要的点详细讲解一下。

IterateRoots

IterateRoots(&copy_visitor)用来将根对象拷贝到to_space_,具体代码如下:

void Heap::IterateRoots(ObjectVisitor* v) {
  IterateStrongRoots(v);
  // copy object
  v->VisitPointer(reinterpret_cast<Object**>(&symbol_table_));
  SYNCHRONIZE_TAG("symbol_table");
}

void Heap::IterateStrongRoots(ObjectVisitor* v) {
#define ROOT_ITERATE(type, name) \
  v->VisitPointer(reinterpret_cast<Object**>(&name##_));
  STRONG_ROOT_LIST(ROOT_ITERATE);
#undef ROOT_ITERATE
  SYNCHRONIZE_TAG("strong_root_list");

#define STRUCT_MAP_ITERATE(NAME, Name, name) \
  v->VisitPointer(reinterpret_cast<Object**>(&name##_map_));
  STRUCT_LIST(STRUCT_MAP_ITERATE);
#undef STRUCT_MAP_ITERATE
  SYNCHRONIZE_TAG("struct_map");

#define SYMBOL_ITERATE(name, string) \
  v->VisitPointer(reinterpret_cast<Object**>(&name##_));
  SYMBOL_LIST(SYMBOL_ITERATE)
#undef SYMBOL_ITERATE
  SYNCHRONIZE_TAG("symbol");

  Bootstrapper::Iterate(v);
  SYNCHRONIZE_TAG("bootstrapper");
  Top::Iterate(v);
  SYNCHRONIZE_TAG("top");
  Debug::Iterate(v);
  SYNCHRONIZE_TAG("debug");

  // Iterate over local handles in handle scopes.
  HandleScopeImplementer::Iterate(v);
  SYNCHRONIZE_TAG("handlescope");

  // Iterate over the builtin code objects and code stubs in the heap. Note
  // that it is not strictly necessary to iterate over code objects on
  // scavenge collections.  We still do it here because this same function
  // is used by the mark-sweep collector and the deserializer.
  Builtins::IterateBuiltins(v);
  SYNCHRONIZE_TAG("builtins");

  // Iterate over global handles.
  GlobalHandles::IterateRoots(v);
  SYNCHRONIZE_TAG("globalhandles");

  // Iterate over pointers being held by inactive threads.
  ThreadManager::Iterate(v);
  SYNCHRONIZE_TAG("threadmanager");
}

这里拷贝了strong_root_list等根对象,拷贝的关键在于v->VisitPointer(reinterpret_cast<Object**>(&name##_));这段逻辑,这段逻辑实际调用的是CopyVisitor::VisitPointer,具体代码如下:

// Helper class for copying HeapObjects
class CopyVisitor: public ObjectVisitor {
 public:

  void VisitPointer(Object** p) {
    CopyObject(p);
  }

  void VisitPointers(Object** start, Object** end) {
    // Copy all HeapObject pointers in [start, end)
    for (Object** p = start; p < end; p++) CopyObject(p);
  }

 private:
  void CopyObject(Object** p) {
    if (!Heap::InFromSpace(*p)) return;
    Heap::CopyObject(reinterpret_cast<HeapObject**>(p));
  }
};

我们可以看到,实际调用的是Heap::CopyObject,代码如下:

void Heap::CopyObject(HeapObject** p) {
  ASSERT(InFromSpace(*p));

  HeapObject* object = *p;

  // We use the first word (where the map pointer usually is) of a
  // HeapObject to record the forwarding pointer.  A forwarding pointer can
  // point to the old space, the code space, or the to space of the new
  // generation.
  // 获取类型
  // (reinterpret_cast<byte*>(p) + offset - kHeapObjectTag)
  HeapObject* first_word = object->map();

  // If the first word (where the map pointer is) is not a map pointer, the
  // object has already been copied.  We do not use first_word->IsMap()
  // because we know that first_word always has the heap object tag.
  // 如果被引用的对象(live objects)已经被拷贝到to_space_,则简单地更新引用,通过forwarding pointer指向新的to_space_中的新对象
  if (first_word->map()->instance_type() != MAP_TYPE) {
    *p = first_word;
    return;
  }

  // Optimization: Bypass ConsString objects where the right-hand side is
  // Heap::empty_string().  We do not use object->IsConsString because we
  // already know that object has the heap object tag.
  InstanceType type = Map::cast(first_word)->instance_type();
  if (type < FIRST_NONSTRING_TYPE &&
      String::cast(object)->representation_tag() == kConsStringTag &&
      ConsString::cast(object)->second() == Heap::empty_string()) {
    object = HeapObject::cast(ConsString::cast(object)->first());
    *p = object;
    // After patching *p we have to repeat the checks that object is in the
    // active semispace of the young generation and not already copied.
    if (!InFromSpace(object)) return;
    first_word = object->map();
    if (first_word->map()->instance_type() != MAP_TYPE) {
      *p = first_word;
      return;
    }
    type = Map::cast(first_word)->instance_type();
  }

  int object_size = object->SizeFromMap(Map::cast(first_word));
  Object* result;
  // If the object should be promoted, we try to copy it to old space.
  // 上一次scavenge中survive(两次scavenge都survivi) 或 to_space可用空间少于75%时
  if (ShouldBePromoted(object->address(), object_size)) {
    // Heap numbers and sequential strings are promoted to code space, all
    // other object types are promoted to old space.  We do not use
    // object->IsHeapNumber() and object->IsSeqString() because we already
    // know that object has the heap object tag.
    bool has_pointers =
        type != HEAP_NUMBER_TYPE &&
        (type >= FIRST_NONSTRING_TYPE ||
         String::cast(object)->representation_tag() != kSeqStringTag);
    if (has_pointers) {
      result = old_space_->AllocateRaw(object_size);
    } else {
      result = code_space_->AllocateRaw(object_size);
    }

    if (!result->IsFailure()) {
      // object->set_map()
      // set forwarding pointer
      *p = MigrateObject(p, HeapObject::cast(result), object_size);
      if (has_pointers) {
        // Record the object's address at the top of the to space, to allow
        // it to be swept by the scavenger.
        promoted_top -= kPointerSize;
        Memory::Object_at(promoted_top) = *p;
      } else {
#ifdef DEBUG
        // Objects promoted to the code space should not have pointers to
        // new space.
        VerifyCodeSpacePointersVisitor v;
        (*p)->Iterate(&v);
#endif
      }
      return;
    }
  }

  // The object should remain in new space or the old space allocation failed.
  result = new_space_->AllocateRaw(object_size);
  // Failed allocation at this point is utterly unexpected.
  ASSERT(!result->IsFailure());
  *p = MigrateObject(p, HeapObject::cast(result), object_size);
}

这里主要做了如下几件事:

1.如果被引用的对象(live objects)已经被拷贝到to_space_,则简单地更新引用,通过forwarding pointer指向新的to_space_中的新对象。这里需要注意的是from_space中的对象map pointer指向拷贝到to_space_或old_space中的新对象,这里也是通过这个map pointer来判断是否已经被拷贝
2.判断是对象否需要提升至old_space_或code_space_,如果是,则在相应空间上分配空间并设置promoted_top。这里需要注意的是为了后面的广度优先遍历,需要记录已经提升的对象地址,所以,在to_space_中,会在to_space中从空间末尾开始,从后向前记录提升的对象地址,promoted_top代表提升变量地址的最顶端(反向)
3.最后,如果对象需要在new_space中分配或者old_space空间分配失败,则调用new_space_->AllocateRaw,并设置from_space中原有对象forwarding pointer指向新的to_space_中的新对象

这里需要注意下ShouldBePromoted方法,也就是对象晋升的条件,代码如下:

bool Heap::ShouldBePromoted(Address old_address, int object_size) {
  // An object should be promoted if:
  // - the object has survived a scavenge operation or
  // - to space is already 25% full.
  return old_address < new_space_->age_mark()
      || (new_space_->Size() + object_size) >= (new_space_->Capacity() >> 2);
}

从上述代码中可以看出,有两个条件可以触发变量提升:

1.To空间已经被使用了超过25%
2.对象在上一次scavenge中survive(两次scavenge都survive)

这里new_space_->age_mark()实际记录的是new_mark,记录的是to_space中以分配空间的最顶端,这里表示copy过去的对象的最顶端,后面介绍广度优先遍历to_space_大家会看到。

广度优先遍历to_space_

广度优先遍历to_space中已有的对象指针,其中包含新拷贝到to_space_中的对象和新拷贝到old_space_中的对象。其在to_space里遍历新拷贝到to_space_中的对象和新拷贝到old_space_中的对象时,各用到两个指针(mark、top)来表示遍历的位置,当四个指针两两重合时,遍历结束。

具体代码如下:

// Copy objects reachable from newly copied objects.
    // 广度优先遍历
    // 相等的时候停止
    // allocation_info_.top
    while (new_mark < new_space_->top() || promoted_mark > promoted_top) {
      // Sweep newly copied objects in the to space.  The allocation pointer
      // can change during sweeping.
      Address previous_top = new_space_->top();
      SemiSpaceIterator new_it(new_space_, new_mark);
      while (new_it.has_next()) {
        new_it.next()->Iterate(&copy_visitor);
      }
      new_mark = previous_top;

      // Sweep newly copied objects in the old space.  The promotion 'top'
      // pointer could change during sweeping.
      previous_top = promoted_top;
      for (Address current = promoted_mark - kPointerSize;
           current >= previous_top;
           current -= kPointerSize) {
        HeapObject* object = HeapObject::cast(Memory::Object_at(current));
        object->Iterate(&copy_visitor);
        UpdateRSet(object);
      }
      promoted_mark = previous_top;
    }

为了解释几个指针的作用,看如下场景:

看了上面的一堆东西,大家可能会看的云里雾里,可以看下浅谈V8引擎中的垃圾回收机制里面的例子,写的比较清楚。

Mark-Compact

Mark-Compact算法主要包含两个阶段:

1.标记阶段,找到并标记所有live objects
2.整理/清除阶段,在此阶段会通过将live objects复制到一个新的连续空间的方式对堆内存进行整理

Mark-Compact代码如下:

void Heap::MarkCompact() {
  gc_state_ = MARK_COMPACT;
#ifdef DEBUG
  mc_count_++;
#endif
  LOG(ResourceEvent("markcompact", "begin"));

  MarkCompactPrologue();

  MarkCompactCollector::CollectGarbage();

  MarkCompactEpilogue();

  LOG(ResourceEvent("markcompact", "end"));

  gc_state_ = NOT_IN_GC;

  Shrink();

  Counters::objs_since_last_full.Set(0);
}

这里主要做了垃圾回收的序幕(设置标记位等)、垃圾回收、垃圾回收的收尾(与序幕对应)、空间压缩这几件事,下面将对其进行详细介绍。

MarkCompactPrologue

MarkCompactPrologue代码如下:

void Heap::MarkCompactPrologue() {
  // 清除缓存的编译代码
  RegExpImpl::OldSpaceCollectionPrologue();
  // 对当前线程中所有栈帧中的所有stackhandler进行操作,将其程序计数器(PC)设置为偏移量,代替原来的绝对地址
  Top::MarkCompactPrologue();
  // 对所有线程执行上述操作
  ThreadManager::MarkCompactPrologue();
}

这里主要做了两件事:

1.清除缓存的编译代码
2.对所有线程的栈帧中的所有stack handle重新设置其pc_address(绝对值改成偏移量)

Top::MarkCompactPrologue();调用的代码如下:

void Top::MarkCompactPrologue(ThreadLocalTop* thread) {
  StackFrame::CookFramesForThread(thread);
}

void StackFrame::CookFramesForThread(ThreadLocalTop* thread) {
  ASSERT(!thread->stack_is_cooked());
  for (StackFrameIterator it(thread); !it.done(); it.Advance()) {
    it.frame()->Cook();
  }
  thread->set_stack_is_cooked(true);
}

void StackFrame::Cook() {
  Code* code = FindCode();
  for (StackHandlerIterator it(this, top_handler()); !it.done(); it.Advance()) {
    it.handler()->Cook(code);
  }
  ASSERT(code->contains(pc()));
  set_pc(AddressFrom<Address>(pc() - code->instruction_start()));
}

void StackHandler::Cook(Code* code) {
  ASSERT(code->contains(pc()));
  set_pc(AddressFrom<Address>(pc() - code->instruction_start()));
}

我们从上述代码可以看到,最终调用了StackHandler::set_pc来设置其pc_address。

MarkCompactCollector::CollectGarbage

这里进入到了MarkCompact的垃圾回收阶段,代码如下:

void MarkCompactCollector::CollectGarbage() {
  // 一些准备工作
  Prepare();

  // 从root object开始深度优先遍历,标记live objects
  MarkLiveObjects();

  // 清理LargeObjectSpace,释放un_marked的LargeObject
  SweepLargeObjectSpace();

  if (compacting_collection_) {
    // 分配空间、在old_object的map指针或from_space的对应位置记录new_object地址
    EncodeForwardingAddresses();

    // 更新所有指向live_objects的pointer,使其指向新对象地址
    UpdatePointers();

    // 把原对象内存拷贝至新的对象内存
    // 利用memmove
    RelocateObjects();

    // 重制remember_set
    // remember_set are sparse, faster (eg, binary) search for set bits
    RebuildRSets();

  } else {
    // 回收没有被标记的内存
    SweepSpaces();
  }

  Finish();
}

这里主要做了如下几件事:

1.获取compacting_collection_,决定后面的过程是否需要内存整理
2.从root object开始深度优先遍历,标记live objects
3.清理LargeObjectSpace,释放un_marked的LargeObject
4.判断需要整理,也就是`compacting_collection_`是否为`true`
	a.需要整理
		i.分配空间、在old_object的map指针或from_space的对应位置记录new_object地址
		ii.更新所有指向live_objects的pointer,使其指向新对象地址
		iii.用memmove方法把原对象内存拷贝至新的对象内存
		iv.重制remember_set(remember_set用来快速搜索bit)
	b.不需要整理,只是清除即可
		i.回收没有被标记的内存
5.Finish,清空StubCache。

这里分阶段详细讲解下:

Prepare

Prepare用来获取现有状态是否需要进行内存整理,还是只回收就好。代码如下:

void MarkCompactCollector::Prepare() {
  static const int kFragmentationLimit = 50;  // Percent.
#ifdef DEBUG
  ASSERT(state_ == IDLE);
  state_ = PREPARE_GC;
#endif
  ASSERT(!FLAG_always_compact || !FLAG_never_compact);

  compacting_collection_ = FLAG_always_compact;

  // We compact the old generation if it gets too fragmented (ie, we could
  // recover an expected amount of space by reclaiming the waste and free
  // list blocks).  We always compact when the flag --gc-global is true
  // because objects do not get promoted out of new space on non-compacting
  // GCs.
  // 碎片化严重时进行compact
  // 当--gc-global为true时,进行compact
  if (!compacting_collection_) {
    // 可恢复的空间
    // 整个空间有 Size(已用) + Waste(浪费) + AvailableFree(剩余) 组成
    int old_gen_recoverable = Heap::old_space()->Waste()
                            + Heap::old_space()->AvailableFree()
                            + Heap::code_space()->Waste()
                            + Heap::code_space()->AvailableFree();
    int old_gen_used = old_gen_recoverable
                     + Heap::old_space()->Size()
                     + Heap::code_space()->Size();
    int old_gen_fragmentation = (old_gen_recoverable * 100) / old_gen_used;
    // old_gen_fragmentation > 50
    if (old_gen_fragmentation > kFragmentationLimit) {
      compacting_collection_ = true;
    }
  }

  if (FLAG_never_compact) compacting_collection_ = false;

#ifdef DEBUG
  if (compacting_collection_) {
    // We will write bookkeeping information to the remembered set area
    // starting now.
    // page设置成NOT_IN_USE
    Page::set_rset_state(Page::NOT_IN_USE);
  }
#endif

  Heap::map_space()->PrepareForMarkCompact(compacting_collection_);
  Heap::old_space()->PrepareForMarkCompact(compacting_collection_);
  Heap::code_space()->PrepareForMarkCompact(compacting_collection_);

  Counters::global_objects.Set(0);

#ifdef DEBUG
  live_bytes_ = 0;
  live_young_objects_ = 0;
  live_old_objects_ = 0;
  live_immutable_objects_ = 0;
  live_map_objects_ = 0;
  live_lo_objects_ = 0;
#endif
}

这里碎片化严重时进行compact,具体的计算使用 可回收的空间/总空间,当大于50%时,使用compact。

MarkLiveObjects

MarkLiveObjects用来标记所有的live object,从root object开始深度优先遍历,代码如下:

// 遍历,为live object标记mark
void MarkCompactCollector::MarkLiveObjects() {
#ifdef DEBUG
  ASSERT(state_ == PREPARE_GC);
  state_ = MARK_LIVE_OBJECTS;
#endif
  // The to space contains live objects, the from space is used as a marking
  // stack.
  marking_stack.Initialize(Heap::new_space()->FromSpaceLow(),
                           Heap::new_space()->FromSpaceHigh());

  // 返回marking_stack.is_overflow
  ASSERT(!marking_stack.overflowed());

  // Mark the heap roots, including global variables, stack variables, etc.
  // 遍历根对象,set_mark & push stack
  MarkingVisitor marking_visitor;
  Heap::IterateStrongRoots(&marking_visitor);

  // Take care of the symbol table specially.
  SymbolTable* symbol_table = SymbolTable::cast(Heap::symbol_table());
#ifdef DEBUG
  UpdateLiveObjectCount(symbol_table);
#endif

  // 1. mark the prefix of the symbol table and push the objects on
  // the stack.
  symbol_table->IteratePrefix(&marking_visitor);
  // 2. mark the symbol table without pushing it on the stack.
  set_mark(symbol_table);  // map word is changed.

  bool has_processed_weak_pointers = false;

  // Mark objects reachable from the roots.
  while (true) {
    // 深度优先遍历,标记、入栈
    MarkObjectsReachableFromTopFrame();

    if (!marking_stack.overflowed()) {
      if (has_processed_weak_pointers) break;
      // First we mark weak pointers not yet reachable.
      GlobalHandles::MarkWeakRoots(&MustBeMarked);
      // Then we process weak pointers and process the transitive closure.
      GlobalHandles::IterateWeakRoots(&marking_visitor);
      has_processed_weak_pointers = true;
      continue;
    }

    // The marking stack overflowed, we need to rebuild it by scanning the
    // whole heap.
    marking_stack.clear_overflowed();

    // We have early stops if the stack overflowed again while scanning
    // overflowed objects in a space.
    SemiSpaceIterator new_it(Heap::new_space(), &OverflowObjectSize);
    ScanOverflowedObjects(&new_it);
    if (marking_stack.overflowed()) continue;

    HeapObjectIterator old_it(Heap::old_space(), &OverflowObjectSize);
    ScanOverflowedObjects(&old_it);
    if (marking_stack.overflowed()) continue;

    HeapObjectIterator code_it(Heap::code_space(), &OverflowObjectSize);
    ScanOverflowedObjects(&code_it);
    if (marking_stack.overflowed()) continue;

    HeapObjectIterator map_it(Heap::map_space(), &OverflowObjectSize);
    ScanOverflowedObjects(&map_it);
    if (marking_stack.overflowed()) continue;

    LargeObjectIterator lo_it(Heap::lo_space(), &OverflowObjectSize);
    ScanOverflowedObjects(&lo_it);
  }

  // Prune the symbol table removing all symbols only pointed to by
  // the symbol table.
  SymbolTableCleaner v;
  symbol_table->IterateElements(&v);
  symbol_table->ElementsRemoved(v.PointersRemoved());

#ifdef DEBUG
  if (FLAG_verify_global_gc) VerifyHeapAfterMarkingPhase();
#endif

  // Remove object groups after marking phase.
  GlobalHandles::RemoveObjectGroups();

  // Objects in the active semispace of the young generation will be relocated
  // to the inactive semispace.  Set the relocation info to the beginning of
  // the inactive semispace.
  Heap::new_space()->MCResetRelocationInfo();
}

void MarkCompactCollector::MarkObjectsReachableFromTopFrame() {
  MarkingVisitor marking_visitor;
  do {
    while (!marking_stack.is_empty()) {
      // marking_stack出栈
      HeapObject* obj = marking_stack.Pop();
      ASSERT(Heap::Contains(obj));
      ASSERT(is_marked(obj) && !is_overflowed(obj));

      // Because the object is marked, the map pointer is not tagged as a
      // normal HeapObject pointer, we need to recover the map pointer,
      // then use the map pointer to mark the object body.
      intptr_t map_word = reinterpret_cast<intptr_t>(obj->map());
      Map* map = reinterpret_cast<Map*>(clear_mark_bit(map_word));
      MarkObject(map);
      // 遍历其子节点
      obj->IterateBody(map->instance_type(), obj->SizeFromMap(map),
                       &marking_visitor);
    };
    // Check objects in object groups.
    MarkObjectGroups(&marking_visitor);
  } while (!marking_stack.is_empty());
}

这里主要做了两件事:

1.初始化栈(深度优先遍历使用),这里直接使用new_space的from_space作为标记所用的栈
2.遍历根节点相连对象,标记、入栈
3.将栈中对象一个个pop出来,进行深度优先遍历标记

标记的具体代码如下:

// Mark object pointed to by p.
  void MarkObjectByPointer(Object** p) {
    Object* obj = *p;
    if (!obj->IsHeapObject()) return;

    // Optimization: Bypass ConsString object where right size is
    // Heap::empty_string().
    // Please note this checks performed equals:
    //   object->IsConsString() &&
    //   (ConsString::cast(object)->second() == Heap::empty_string())
    // except the map for the object might be marked.
    intptr_t map_word =
        reinterpret_cast<intptr_t>(HeapObject::cast(obj)->map());
    uint32_t tag =
        (reinterpret_cast<Map*>(clear_mark_bit(map_word)))->instance_type();
    if ((tag < FIRST_NONSTRING_TYPE) &&
        (kConsStringTag ==
         static_cast<StringRepresentationTag>(tag &
                                              kStringRepresentationMask)) &&
        (Heap::empty_string() ==
         reinterpret_cast<String*>(
             reinterpret_cast<ConsString*>(obj)->second()))) {
      // Since we don't have the object start it is impossible to update the
      // remeber set quickly.  Therefore this optimization only is taking
      // place when we can avoid changing.
      Object* first = reinterpret_cast<ConsString*>(obj)->first();
      if (Heap::InNewSpace(obj) || !Heap::InNewSpace(first)) {
        obj = first;
        *p = obj;
      }
    }
    MarkCompactCollector::MarkObject(HeapObject::cast(obj));
  }

主要是通过对象的size是否与Heap::empty_string()相同来判断对象是否是活着的,标记过程中通过obj的map pointer记录标记,然后入栈。

compact

当需要进行整理时,主要进行如下几个步骤:

EncodeForwardingAddresses

EncodeForwardingAddresses用来分配空间并在old_object的map指针或from_space的对应位置记录new_object地址。代码如下

void MarkCompactCollector::EncodeForwardingAddresses() {
  ASSERT(state_ == ENCODE_FORWARDING_ADDRESSES);
  // Compute the forwarding pointers in each space.
  // 分配新空间、在old_object的map指针上记录new_object的地址
  // &mc_allocation_info记录分配信息(top等)
  EncodeForwardingAddressesInPagedSpace<MCAllocateFromOldSpace,
                                        IgnoreNonLiveObject>(
      Heap::old_space());

  EncodeForwardingAddressesInPagedSpace<MCAllocateFromCodeSpace,
                                        LogNonLiveCodeObject>(
      Heap::code_space());

  // Compute new space next to last after the old and code spaces have been
  // compacted.  Objects in new space can be promoted to old or code space.
  // 这里新生代对象有可能提升到老生代
  // 需要注意的是,new space使用from_space来帮助记录新对象地址,因为to_space和from_space空间大小相同,所以用from_space相同offset的位置记录相应的new_object地址的
  EncodeForwardingAddressesInNewSpace();

  // Compute map space last because computing forwarding addresses
  // overwrites non-live objects.  Objects in the other spaces rely on
  // non-live map pointers to get the sizes of non-live objects.
  EncodeForwardingAddressesInPagedSpace<MCAllocateFromMapSpace,
                                        IgnoreNonLiveObject>(
      Heap::map_space());

  // Write relocation info to the top page, so we can use it later.  This is
  // done after promoting objects from the new space so we get the correct
  // allocation top.
  Heap::old_space()->MCWriteRelocationInfoToPage();
  Heap::code_space()->MCWriteRelocationInfoToPage();
  Heap::map_space()->MCWriteRelocationInfoToPage();
}


void MarkCompactCollector::SweepSpaces() {
  ASSERT(state_ == SWEEP_SPACES);
  ASSERT(!IsCompacting());
  // Noncompacting collections simply sweep the spaces to clear the mark
  // bits and free the nonlive blocks (for old and map spaces).  We sweep
  // the map space last because freeing non-live maps overwrites them and
  // the other spaces rely on possibly non-live maps to get the sizes for
  // non-live objects.
  SweepSpace(Heap::old_space(), &DeallocateOldBlock);
  SweepSpace(Heap::code_space(), &DeallocateCodeBlock);
  SweepSpace(Heap::new_space());
  SweepSpace(Heap::map_space(), &DeallocateMapBlock);
}

这里分别对所有内存space进行操作。下面主要介绍对old_space和new_space的操作。

对old_space的分配、标记的代码如下:

template<MarkCompactCollector::AllocationFunction Alloc,
         MarkCompactCollector::ProcessNonLiveFunction ProcessNonLive>
void MarkCompactCollector::EncodeForwardingAddressesInPagedSpace(
    PagedSpace* space) {
  PageIterator it(space, PageIterator::PAGES_IN_USE);
  while (it.has_next()) {
    Page* p = it.next();
    // The offset of each live object in the page from the first live object
    // in the page.
    int offset = 0;
    // 为marked Object分配新内存,在老对象中存储新对象偏移量
    // 以Page(内存页)为单位进行处理
    EncodeForwardingAddressesInRange<Alloc,
                                     EncodeForwardingAddressInPagedSpace,
                                     ProcessNonLive>(
        p->ObjectAreaStart(),
        p->AllocationTop(),
        &offset);
  }
}

这里循环遍历每一页,对其执行EncodeForwardingAddressesInRange方法。具体代码如下:

// Function template that, given a range of addresses (eg, a semispace or a
// paged space page), iterates through the objects in the range to clear
// mark bits and compute and encode forwarding addresses.  As a side effect,
// maximal free chunks are marked so that they can be skipped on subsequent
// sweeps.
//
// The template parameters are an allocation function, a forwarding address
// encoding function, and a function to process non-live objects.
template<MarkCompactCollector::AllocationFunction Alloc,
         MarkCompactCollector::EncodingFunction Encode,
         MarkCompactCollector::ProcessNonLiveFunction ProcessNonLive>
inline void EncodeForwardingAddressesInRange(Address start,
                                             Address end,
                                             int* offset) {
  // The start address of the current free region while sweeping the space.
  // This address is set when a transition from live to non-live objects is
  // encountered.  A value (an encoding of the 'next free region' pointer)
  // is written to memory at this address when a transition from non-live to
  // live objects is encountered.
  Address free_start = NULL;

  // A flag giving the state of the previously swept object.  Initially true
  // to ensure that free_start is initialized to a proper address before
  // trying to write to it.
  bool is_prev_alive = true;

  int object_size;  // Will be set on each iteration of the loop.
  for (Address current = start; current < end; current += object_size) {
    HeapObject* object = HeapObject::FromAddress(current);
    if (is_marked(object)) {
      clear_mark(object);
      object_size = object->Size();

      Object* forwarded = Alloc(object, object_size);
      // Allocation cannot fail, because we are compacting the space.
      ASSERT(!forwarded->IsFailure());
      Encode(object, object_size, forwarded, offset);

#ifdef DEBUG
      if (FLAG_gc_verbose) {
        PrintF("forward %p -> %p.\n", object->address(),
               HeapObject::cast(forwarded)->address());
      }
#endif
      if (!is_prev_alive) {  // Transition from non-live to live.
        EncodeFreeRegion(free_start, current - free_start);
        is_prev_alive = true;
      }
    } else {  // Non-live object.
      object_size = object->Size();
      ProcessNonLive(object);
      if (is_prev_alive) {  // Transition from live to non-live.
        free_start = current;
        is_prev_alive = false;
      }
    }
  }

  // If we ended on a free region, mark it.
  if (!is_prev_alive) EncodeFreeRegion(free_start, end - free_start);
}


EncodeForwardingAddressesInRange中主要做了两件事:

1.分配新内存 Alloc
2.记录新对象地址,Encode

其中Encode调用了传入模版的EncodeForwardingAddressInPagedSpace方法在old_object中标记new_object的地址,代码如下:

// The forwarding address is encoded in the map pointer of the object as an
// offset (in terms of live bytes) from the address of the first live object
// in the page.
// forwarding addres用距离当前页第一个live object的距离来表示
// 存储在对象的map pointer中
inline void EncodeForwardingAddressInPagedSpace(HeapObject* old_object,
                                                int object_size,
                                                Object* new_object,
                                                int* offset) {
  // Record the forwarding address of the first live object if necessary.
  if (*offset == 0) {
    Page::FromAddress(old_object->address())->mc_first_forwarded =
        HeapObject::cast(new_object)->address();
  }

  uint32_t encoded = EncodePointers(old_object->map()->address(), *offset);
  old_object->set_map(reinterpret_cast<Map*>(encoded));
  *offset += object_size;
  ASSERT(*offset <= Page::kObjectAreaSize);
}

这里需要注意的是在old_object所在页的mc_first_forwarded属性上记录了给第一个live object分配的新对象地址,在old_object中的map指针上只记录new_object距离第一个live object的new_object的距离(forwarding address),同时forwarding address还包含当前页的一些信息(page_index等)。

对新生代分配、标记的代码如下:

// Functions to encode the forwarding pointers in each compactable space.
void MarkCompactCollector::EncodeForwardingAddressesInNewSpace() {
  int ignored;
  EncodeForwardingAddressesInRange<MCAllocateFromNewSpace,
                                   EncodeForwardingAddressInNewSpace,
                                   IgnoreNonLiveObject>(
      Heap::new_space()->bottom(),
      Heap::new_space()->top(),
      &ignored);
}

EncodeForwardingAddressesInRange与old_space中的操作一样,分配和记录两项工作,只不过对应函数不同。

新生代内存分配之前讲过,这里与老生代有一点不同的是会优先通过对象晋升来分配新内存,代码如下:

// Try to promote all objects in new space.  Heap numbers and sequential
// strings are promoted to the code space, all others to the old space.
inline Object* MCAllocateFromNewSpace(HeapObject* object, int object_size) {
  bool has_pointers = !object->IsHeapNumber() && !object->IsSeqString();
  Object* forwarded = has_pointers ?
      Heap::old_space()->MCAllocateRaw(object_size) :
      Heap::code_space()->MCAllocateRaw(object_size);

  if (forwarded->IsFailure()) {
    forwarded = Heap::new_space()->MCAllocateRaw(object_size);
  }
  return forwarded;
}

这里可以看到,先在old_space分配,分配失败才会在new_space上分配。

新生代old_object记录new_object地址的方式也跟old_space不同,主要看如下代码:

// The forwarding address is encoded at the same offset as the current
// to-space object, but in from space.
// 再from_space的相同offset的位置记录new_object的地址
// new_object可能提升到老生代,也可能还在新生代
inline void EncodeForwardingAddressInNewSpace(HeapObject* old_object,
                                              int object_size,
                                              Object* new_object,
                                              int* ignored) {
  int offset =
      Heap::new_space()->ToSpaceOffsetForAddress(old_object->address());
  Memory::Address_at(Heap::new_space()->FromSpaceLow() + offset) =
      HeapObject::cast(new_object)->address();
}

这里可以看到,使用的是from_space的相同offset的位置记录new_object的地址。这里利用new space使用from_space来帮助记录新对象地址,因为to_space和from_space空间大小相同,所以用from_space相同offset的位置记录相应的new_object地址的。

UpdatePointers

UpdatePointers代码如下:

void MarkCompactCollector::UpdatePointers() {
#ifdef DEBUG
  ASSERT(state_ == ENCODE_FORWARDING_ADDRESSES);
  state_ = UPDATE_POINTERS;
#endif
  UpdatingVisitor updating_visitor;
  Heap::IterateRoots(&updating_visitor);
  GlobalHandles::IterateWeakRoots(&updating_visitor);

  int live_maps = IterateLiveObjects(Heap::map_space(),
                                     &UpdatePointersInOldObject);
  int live_olds = IterateLiveObjects(Heap::old_space(),
                                     &UpdatePointersInOldObject);
  int live_immutables = IterateLiveObjects(Heap::code_space(),
                                           &UpdatePointersInOldObject);
  int live_news = IterateLiveObjects(Heap::new_space(),
                                     &UpdatePointersInNewObject);

  // Large objects do not move, the map word can be updated directly.
  LargeObjectIterator it(Heap::lo_space());
  while (it.has_next()) UpdatePointersInNewObject(it.next());

  USE(live_maps);
  USE(live_olds);
  USE(live_immutables);
  USE(live_news);

#ifdef DEBUG
  ASSERT(live_maps == live_map_objects_);
  ASSERT(live_olds == live_old_objects_);
  ASSERT(live_immutables == live_immutable_objects_);
  ASSERT(live_news == live_young_objects_);

  if (FLAG_verify_global_gc) VerifyHeapAfterUpdatingPointers();
#endif
}

这里其实是更新所有指向live_object的pointer,使其指向新地址:

1.遍历root object,更新其指针指向新分配的对象
2.遍历所有space,更新其指针

这里挑一些重要的点给大家讲解一下:

获取新地址并更新的操作在MarkCompactCollector::UpdatePointer中,代码如下:

// 获取新地址并更新
void MarkCompactCollector::UpdatePointer(Object** p) {
  // We need to check if p is in to_space.
  if (!(*p)->IsHeapObject()) return;

  HeapObject* obj = HeapObject::cast(*p);
  Address old_addr = obj->address();
  Address new_addr;

  ASSERT(!Heap::InFromSpace(obj));

  if (Heap::new_space()->Contains(obj)) {
    Address f_addr = Heap::new_space()->FromSpaceLow() +
                     Heap::new_space()->ToSpaceOffsetForAddress(old_addr);
    new_addr = Memory::Address_at(f_addr);

#ifdef DEBUG
    ASSERT(Heap::old_space()->Contains(new_addr) ||
           Heap::code_space()->Contains(new_addr) ||
           Heap::new_space()->FromSpaceContains(new_addr));

    if (Heap::new_space()->FromSpaceContains(new_addr)) {
      ASSERT(Heap::new_space()->FromSpaceOffsetForAddress(new_addr) <=
             Heap::new_space()->ToSpaceOffsetForAddress(old_addr));
    }
#endif

  } else if (Heap::lo_space()->Contains(obj)) {
    // Don't move objects in the large object space.
    new_addr = obj->address();

  } else {
    ASSERT(Heap::old_space()->Contains(obj) ||
           Heap::code_space()->Contains(obj) ||
           Heap::map_space()->Contains(obj));

    new_addr = GetForwardingAddressInOldSpace(obj);
    ASSERT(Heap::old_space()->Contains(new_addr) ||
           Heap::code_space()->Contains(new_addr) ||
           Heap::map_space()->Contains(new_addr));

#ifdef DEBUG
    if (Heap::old_space()->Contains(obj)) {
      ASSERT(Heap::old_space()->MCSpaceOffsetForAddress(new_addr) <=
             Heap::old_space()->MCSpaceOffsetForAddress(old_addr));
    } else if (Heap::code_space()->Contains(obj)) {
      ASSERT(Heap::code_space()->MCSpaceOffsetForAddress(new_addr) <=
             Heap::code_space()->MCSpaceOffsetForAddress(old_addr));
    } else {
      ASSERT(Heap::map_space()->MCSpaceOffsetForAddress(new_addr) <=
             Heap::map_space()->MCSpaceOffsetForAddress(old_addr));
    }
#endif
  }

  *p = HeapObject::FromAddress(new_addr);

#ifdef DEBUG
  if (FLAG_gc_verbose) {
    PrintF("update %p : %p -> %p\n",
           reinterpret_cast<Address>(p), old_addr, new_addr);
  }
#endif
}

这里更新指针操作直接赋值就好,主要是是获取地址,其过程如下:

1.如果是新生代对象,直接从from_space中相同offset的地方获取就好
2.老生代通过GetForwardingAddressInOldSpace方法获取

GetForwardingAddressInOldSpace代码如下:

Address MarkCompactCollector::GetForwardingAddressInOldSpace(HeapObject* obj) {
  // Object should either in old or map space.
  uint32_t encoded = reinterpret_cast<uint32_t>(obj->map());

  // Offset to the first live object's forwarding address.
  int offset = DecodeOffset(encoded);
  Address obj_addr = obj->address();

  // Find the first live object's forwarding address.
  Page* p = Page::FromAddress(obj_addr);
  Address first_forwarded = p->mc_first_forwarded;

  // Page start address of forwarded address.
  Page* forwarded_page = Page::FromAddress(first_forwarded);
  int forwarded_offset = forwarded_page->Offset(first_forwarded);

  // Find end of allocation of in the page of first_forwarded.
  Address mc_top = forwarded_page->mc_relocation_top;
  int mc_top_offset = forwarded_page->Offset(mc_top);

  // Check if current object's forward pointer is in the same page
  // as the first live object's forwarding pointer
  // 在当前页
  if (forwarded_offset + offset < mc_top_offset) {
    // In the same page.
    return first_forwarded + offset;
  }

  // 不在当前页属时,顺延至下一页
  // Must be in the next page, NOTE: this may cross chunks.
  Page* next_page = forwarded_page->next_page();
  ASSERT(next_page->is_valid());

  offset -= (mc_top_offset - forwarded_offset);
  offset += Page::kObjectStartOffset;

  ASSERT_PAGE_OFFSET(offset);
  ASSERT(next_page->OffsetToAddress(offset) < next_page->mc_relocation_top);

  return next_page->OffsetToAddress(offset);
}

这里做了如下几件事:

1.获取当前page的mc_first_forwarded,也就是新分配的第一个对象地址
2.取出对应offset
3.判断是否在一页当中
	a.在一页中,直接返回first_forwarded + offset就好
	b.不在一页中(forwarded_offset + offset大于一页),在下一页中分配,这里需要重新更新下offset,然后使用next_page->OffsetToAddress(offset)获取地址
RelocateObjects

RelocateObjects将原对象的拷贝到新对象的内存中,代码如下:

void MarkCompactCollector::RelocateObjects() {
#ifdef DEBUG
  ASSERT(state_ == UPDATE_POINTERS);
  state_ = RELOCATE_OBJECTS;
#endif
  // Relocates objects, always relocate map objects first. Relocating
  // objects in other space relies on map objects to get object size.
  int live_maps = IterateLiveObjects(Heap::map_space(), &RelocateMapObject);
  int live_olds = IterateLiveObjects(Heap::old_space(), &RelocateOldObject);
  int live_immutables =
      IterateLiveObjects(Heap::code_space(), &RelocateCodeObject);
  int live_news = IterateLiveObjects(Heap::new_space(), &RelocateNewObject);

  USE(live_maps);
  USE(live_olds);
  USE(live_immutables);
  USE(live_news);
#ifdef DEBUG
  ASSERT(live_maps == live_map_objects_);
  ASSERT(live_olds == live_old_objects_);
  ASSERT(live_immutables == live_immutable_objects_);
  ASSERT(live_news == live_young_objects_);
#endif

  // Notify code object in LO to convert IC target to address
  // This must happen after lo_space_->Compact
  LargeObjectIterator it(Heap::lo_space());
  while (it.has_next()) { ConvertCodeICTargetToAddress(it.next()); }

  // Flips from and to spaces
  Heap::new_space()->Flip();

  // Sets age_mark to bottom in to space
  Address mark = Heap::new_space()->bottom();
  Heap::new_space()->set_age_mark(mark);

  Heap::new_space()->MCCommitRelocationInfo();
#ifdef DEBUG
  // It is safe to write to the remembered sets as remembered sets on a
  // page-by-page basis after committing the m-c forwarding pointer.
  Page::set_rset_state(Page::IN_USE);
#endif
  Heap::map_space()->MCCommitRelocationInfo();
  Heap::old_space()->MCCommitRelocationInfo();
  Heap::code_space()->MCCommitRelocationInfo();

#ifdef DEBUG
  if (FLAG_verify_global_gc) VerifyHeapAfterRelocatingObjects();
#endif
}

这里对所有空间的对象进行遍历,然后进行复制,复制的代码如下:

int MarkCompactCollector::RelocateMapObject(HeapObject* obj) {
  // decode map pointer (forwarded address)
  uint32_t encoded = reinterpret_cast<uint32_t>(obj->map());
  Address map_addr = DecodeMapPointer(encoded, Heap::map_space());
  ASSERT(Heap::map_space()->Contains(HeapObject::FromAddress(map_addr)));

  // Get forwarding address before resetting map pointer
  Address new_addr = GetForwardingAddressInOldSpace(obj);

  // recover map pointer
  obj->set_map(reinterpret_cast<Map*>(HeapObject::FromAddress(map_addr)));

  // The meta map object may not be copied yet.
  Address old_addr = obj->address();

  if (new_addr != old_addr) {
    memmove(new_addr, old_addr, Map::kSize);  // copy contents
  }

#ifdef DEBUG
  if (FLAG_gc_verbose) {
    PrintF("relocate %p -> %p\n", old_addr, new_addr);
  }
#endif

  return Map::kSize;
}

主要做了两件事:

1.获取新地址(与上面讲解的获取新地址逻辑相同)
2.利用memmove方法对内存空间进行复制
RebuildRSets
void MarkCompactCollector::RebuildRSets() {
#ifdef DEBUG
  ASSERT(state_ == RELOCATE_OBJECTS);
  state_ = REBUILD_RSETS;
#endif
  Heap::RebuildRSets();
}

这里主要对remember_set进行充值,remember_set用于快速(例如,二进制)搜索标记位

SweepSpaces

SweepSpaces用于清理内存空间而不会像compact去重新整理,当然这里的工作的也是在标记的基础上去做的,SweepSpaces入口代码如下:

void MarkCompactCollector::SweepSpaces() {
  ASSERT(state_ == SWEEP_SPACES);
  ASSERT(!IsCompacting());
  // Noncompacting collections simply sweep the spaces to clear the mark
  // bits and free the nonlive blocks (for old and map spaces).  We sweep
  // the map space last because freeing non-live maps overwrites them and
  // the other spaces rely on possibly non-live maps to get the sizes for
  // non-live objects.
  SweepSpace(Heap::old_space(), &DeallocateOldBlock);
  SweepSpace(Heap::code_space(), &DeallocateCodeBlock);
  SweepSpace(Heap::new_space());
  SweepSpace(Heap::map_space(), &DeallocateMapBlock);
}

这里主要对各个空间进行SweepSpace操作,这里同样对pagedSpace和newSpace的操作不同(函数冲载)。

对于pagedSpace,SweepSpace代码如下:

static void SweepSpace(PagedSpace* space, DeallocateFunction dealloc) {
  PageIterator it(space, PageIterator::PAGES_IN_USE);
  // 遍历每一页
  while (it.has_next()) {
    Page* p = it.next();

    bool is_previous_alive = true;
    Address free_start = NULL;
    HeapObject* object;

    for (Address current = p->ObjectAreaStart();
         current < p->AllocationTop();
         current += object->Size()) {
      object = HeapObject::FromAddress(current);
      if (is_marked(object)) {
        clear_mark(object);
        if (MarkCompactCollector::IsCompacting() && object->IsCode()) {
          // If this is compacting collection marked code objects have had
          // their IC targets converted to objects.
          // They need to be converted back to addresses.
          Code::cast(object)->ConvertICTargetsFromObjectToAddress();
        }
        if (!is_previous_alive) {  // Transition from free to live.
          dealloc(free_start, current - free_start);
          is_previous_alive = true;
        }
      } else {
        if (object->IsCode()) {
          LOG(CodeDeleteEvent(Code::cast(object)->address()));
        }
        if (is_previous_alive) {  // Transition from live to free.
          free_start = current;
          is_previous_alive = false;
        }
      }
      // The object is now unmarked for the call to Size() at the top of the
      // loop.
    }

    // If the last region was not live we need to from free_start to the
    // allocation top in the page.
    if (!is_previous_alive) {
      int free_size = p->AllocationTop() - free_start;
      if (free_size > 0) {
        dealloc(free_start, free_size);
      }
    }
  }
}

这里遍历每一页中每一个object,如果没有标记,说明需要清除,调用传入的DeallocateFunction,old_space传入的DeallocateOldBlock方法如下:

void MarkCompactCollector::DeallocateOldBlock(Address start,
                                              int size_in_bytes) {
  Heap::ClearRSetRange(start, size_in_bytes);
  Heap::old_space()->Free(start, size_in_bytes);
}

也就是清空空间,加入到free_list中。

新生代的SweepSpace代码如下:

static void SweepSpace(NewSpace* space) {
  HeapObject* object;
  for (Address current = space->bottom();
       current < space->top();
       current += object->Size()) {
    object = HeapObject::FromAddress(current);
    if (is_marked(object)) {
      clear_mark(object);
    } else {
      // We give non-live objects a map that will correctly give their size,
      // since their existing map might not be live after the collection.
      // 更新对象map,因为其对应的map将再下面的sweepSpace中被释放
      int size = object->Size();
      if (size >= Array::kHeaderSize) {
        object->set_map(Heap::byte_array_map());
        ByteArray::cast(object)->set_length(ByteArray::LengthFor(size));
      } else {
        ASSERT(size == kPointerSize);
        object->set_map(Heap::one_word_filler_map());
      }
      ASSERT(object->Size() == size);
    }
    // The object is now unmarked for the call to Size() at the top of the
    // loop.
  }
}

这里直接更新对象对应的map指针,因为其对应的map将再下面的sweepSpace中被释放。

Finish

Finish用来清空StubCache。代码如下:

void MarkCompactCollector::Finish() {
#ifdef DEBUG
  ASSERT(state_ == SWEEP_SPACES || state_ == REBUILD_RSETS);
  state_ = IDLE;
#endi
  // The stub cache is not traversed during GC; clear the cache to
  // force lazy re-initialization of it. This must be done after the
  // GC, because it relies on the new address of certain old space
  // objects (empty string, illegal builtin).
  StubCache::Clear();
}

Stub一般会含有已优化的代码,来处理某个IC(内联缓存)之前所碰到的特定类型的操作。一旦Stub碰到了优化代码无法解决的操作,它会调用C++运行时代码来进行处理。运行时代码处理了这个操作之后,会生成一个新的Stub,包含解决这个操作的方案(当然也包括之前的其他方案)。

Shrink

Shrink用于空间的收缩,分别对map_space_、old_space_、code_space_进行操作,代码如下:

void Heap::Shrink() {
  // Try to shrink map, old, and code spaces.
  map_space_->Shrink();
  old_space_->Shrink();
  code_space_->Shrink();
}

最终都会调用PagedSpace::Shrink方法,代码如下:

void PagedSpace::Shrink() {
  // Release half of free pages.
  // 释放后一般
  Page* top_page = AllocationTopPage();
  ASSERT(top_page->is_valid());

  // Loop over the pages from the top page to the end of the space to count
  // the number of pages to keep and find the last page to keep.
  int free_pages = 0;
  int pages_to_keep = 0;  // Of the free pages.
  Page* last_page_to_keep = top_page;
  Page* current_page = top_page->next_page();
  // Loop over the pages to the end of the space.
  while (current_page->is_valid()) {
    // Keep every odd-numbered page, one page for every two in the space.
    if ((free_pages & 0x1) == 1) {
      pages_to_keep++;
      last_page_to_keep = last_page_to_keep->next_page();
    }
    free_pages++;
    current_page = current_page->next_page();
  }

  // Free pages after last_page_to_keep, and adjust the next_page link.
  Page* p = MemoryAllocator::FreePages(last_page_to_keep->next_page());
  MemoryAllocator::SetNextPage(last_page_to_keep, p);

  // Since pages are only freed in whole chunks, we may have kept more than
  // pages_to_keep.
  while (p->is_valid()) {
    pages_to_keep++;
    p = p->next_page();
  }

  // The difference between free_pages and pages_to_keep is the number of
  // pages actually freed.
  ASSERT(pages_to_keep <= free_pages);
  int bytes_freed = (free_pages - pages_to_keep) * Page::kObjectAreaSize;
  accounting_stats_.ShrinkSpace(bytes_freed);

  ASSERT(Capacity() == CountTotalPages() * Page::kObjectAreaSize);
}

这里其实是释放掉了pagedSpace的后一半,如下图:

总结

本文从源码的角度介绍了V8的内存管理,可能大家会说对日常工作毫无作用,但读下来感觉还是很有意思,拓展了很多知识。

参考文献

V8之内存管理
浅谈V8引擎中的垃圾回收机制
V8 之旅:Full Compiler

Node.js源码-node_javascript.cc

上一篇讲node运行的文章中,我们提到了LoadEnvironment的LoadersBootstrapperSource方法,其从node_javascript.cc中获取loader文件内容的ascII码,node_javascript.cc在out/Debug/gen中,它是如何产生的呢?

本篇文章就是来介绍node_javascript.cc是如何产生的以及其中的内容。

node_js2c

下面是node.gyp定义的node_js2c目标:

{
      'target_name': 'node_js2c',
      'type': 'none',
      'toolsets': ['host'],
      'actions': [
        {
          'action_name': 'node_js2c',
          'process_outputs_as_sources': 1,
          'inputs': [
            '<@(library_files)',
            './config.gypi',
            'tools/check_macros.py'
          ],
          'outputs': [
            '<(SHARED_INTERMEDIATE_DIR)/node_javascript.cc',
          ],
          'conditions': [
            [ 'node_use_dtrace=="false" and node_use_etw=="false"', {
              'inputs': [ 'src/notrace_macros.py' ]
            }],
            [ 'node_use_perfctr=="false"', {
              'inputs': [ 'src/noperfctr_macros.py' ]
            }],
            [ 'node_debug_lib=="false"', {
              'inputs': [ 'tools/nodcheck_macros.py' ]
            }],
            [ 'node_debug_lib=="true"', {
              'inputs': [ 'tools/dcheck_macros.py' ]
            }]
          ],
          'action': [
            'python',
            'tools/js2c.py',
            '<@(_outputs)',
            '<@(_inputs)',
          ],
        },
      ],
    }

inputs

我们可以看到inputs中主要有三个输入,library_files、./config.gypi、tools/check_macros.py。

library_files

其中library_files包含如下文件:

'library_files': [
      'lib/internal/bootstrap/loaders.js',
      'lib/internal/bootstrap/node.js',
      'lib/async_hooks.js',
      'lib/assert.js',
      'lib/buffer.js',
      
      ......
      
      'deps/node-inspect/lib/internal/inspect_client.js',
      'deps/node-inspect/lib/internal/inspect_repl.js',
      'deps/acorn/dist/acorn.js',
      'deps/acorn/dist/walk.js',
    ],

基本上是lib和dep中的一些.js文件。

./config.gypi

./config.gypi主要定义了一些target_defaults(作用域.gyp文件中所有的targets)和一些变量。

tools/check_macros.py

宏定义:

macro CHECK(x) = do { if (!(x)) (process._rawDebug("CHECK: x == true"), process.abort()) } while (0);
macro CHECK_EQ(a, b) = CHECK((a) === (b));
macro CHECK_GE(a, b) = CHECK((a) >= (b));
macro CHECK_GT(a, b) = CHECK((a) > (b));
macro CHECK_LE(a, b) = CHECK((a) <= (b));
macro CHECK_LT(a, b) = CHECK((a) < (b));
macro CHECK_NE(a, b) = CHECK((a) !== (b));

outputs

outputs很简单,在debug模式下就是out/Debug/node_javascript.cc。

action

'action': [
            'python',
            'tools/js2c.py',
            '<@(_outputs)',
            '<@(_inputs)',
          ]

翻译成指令就是:

python tools/js2c.py $(outputs) $(inputs)

js2c.py

下面我们来看下js2c.py里做了什么?

def main():
  natives = sys.argv[1]
  source_files = sys.argv[2:]
  JS2C(source_files, [natives])

调用了JS2C,并将inputs中的所有文件路径作为参数传进去。

我们来看下JS2C:

def JS2C(source, target):
  modules = []
  consts = {}
  macros = {}
  macro_lines = []

  for s in source:
    if (os.path.split(str(s))[1]).endswith('macros.py'):
      macro_lines.extend(ReadLines(str(s)))
    else:
      modules.append(s)

  # Process input from all *macro.py files
  // 拿到宏定义
  (consts, macros) = ReadMacros(macro_lines)

  # Build source code lines
  definitions = []
  initializers = []

  for name in modules:
    lines = ReadFile(str(name))
    // 替换宏定义
    lines = ExpandConstants(lines, consts)
    lines = ExpandMacros(lines, macros)

    deprecated_deps = None

    # On Windows, "./foo.bar" in the .gyp file is passed as "foo.bar"
    # so don't assume there is always a slash in the file path.
    if '/' in name or '\\' in name:
      split = re.split('/|\\\\', name)
      if split[0] == 'deps':
        if split[1] == 'node-inspect' or split[1] == 'v8':
          deprecated_deps = split[1:]
        split = ['internal'] + split
      else:
        split = split[1:]
      name = '/'.join(split)

    # if its a gypi file we're going to want it as json
    # later on anyway, so get it out of the way now
    if name.endswith(".gypi"):
      lines = re.sub(r'#.*?\n', '', lines)
      lines = re.sub(r'\'', '"', lines)
    name = name.split('.', 1)[0]
    var = name.replace('-', '_').replace('/', '_')
    key = '%s_key' % var
    value = '%s_value' % var

    definitions.append(Render(key, name))
    definitions.append(Render(value, lines))
    initializers.append(INITIALIZER.format(key=key, value=value))

    if deprecated_deps is not None:
      name = '/'.join(deprecated_deps)
      name = name.split('.', 1)[0]
      var = name.replace('-', '_').replace('/', '_')
      key = '%s_key' % var
      value = '%s_value' % var

      definitions.append(Render(key, name))
      definitions.append(Render(value, DEPRECATED_DEPS.format(module=name)))
      initializers.append(INITIALIZER.format(key=key, value=value))

  # Emit result
  output = open(str(target[0]), "w")
  output.write(TEMPLATE.format(definitions=''.join(definitions),
                               initializers=''.join(initializers)))
  output.close()

这里一共做了如下几件事:

1.拿到宏定义
2.循环遍历文件
	·宏替换
	·获得所有定义的字符串代码
	·获得所有初始化的字符串代码
	·字符串替换

render

def Render(var, data):
  # Treat non-ASCII as UTF-8 and convert it to UTF-16.
  if any(ord(c) > 127 for c in data):
    template = TWO_BYTE_STRING
    data = map(ord, data.decode('utf-8').encode('utf-16be'))
    data = [data[i] * 256 + data[i+1] for i in xrange(0, len(data), 2)]
    data = ToCArray(data)
  else:
    template = ONE_BYTE_STRING
    data = ToCString(data)
  return template.format(var=var, data=data)

判断文件中字符的ascII码是否超过127,超过的字符被转成UTF-16。

node_javascript.cc

node_javascript.cc主要有以下几部分组成:

1.各个模块key、value对应的结构体的定义

static const uint8_t raw_internal_bootstrap_loaders_key[] = { 105,110,116,101,114,110,97,108,47,98,111,111,116,115,116,114,97,112,47,108,
111,97,100,101,114,115 };
static struct : public v8::String::ExternalOneByteStringResource {
  const char* data() const override {
    return reinterpret_cast<const char*>(raw_internal_bootstrap_loaders_key);
  }
  size_t length() const override { return arraysize(raw_internal_bootstrap_loaders_key); }
  void Dispose() override { /* Default calls `delete this`. */ }
  v8::Local<v8::String> ToStringChecked(v8::Isolate* isolate) {
    return v8::String::NewExternalOneByte(isolate, this).ToLocalChecked();
  }
} internal_bootstrap_loaders_key;

static const uint8_t raw_internal_bootstrap_loaders_value[] = { 47,47,32,84,104,105,115,32,102,105,108,101,32,99,114,101,97,116,101,115,
32,116,104,101,32,105,110,116,101,114,110,97,108,32,109,111,100,117,108,101,
32,38,32,98,105,110,100,105,110,103,32,108,111,97,100,101,114,115,32,117,
 };
static struct : public v8::String::ExternalOneByteStringResource {
  const char* data() const override {
    return reinterpret_cast<const char*>(raw_internal_bootstrap_loaders_value);
  }
  size_t length() const override { return arraysize(raw_internal_bootstrap_loaders_value); }
  void Dispose() override { /* Default calls `delete this`. */ }
  v8::Local<v8::String> ToStringChecked(v8::Isolate* isolate) {
    return v8::String::NewExternalOneByte(isolate, this).ToLocalChecked();
  }
} internal_bootstrap_loaders_value;

我们可以看到数组和两个struct,其中raw_internal_bootstrap_loaders_key和raw_internal_bootstrap_loaders_value分别记录bootstrap_loaders的key和value(文件内容),两个结构体internal_bootstrap_loaders_key和internal_bootstrap_loaders_value均有方法ToStringChecked,而ToStringChecked其实会去找data()方法,也就是说internal_bootstrap_loaders_value.ToStringChecked()便会返回对应的ascII码。

2.初始化函数定义(initializers)

void DefineJavaScript(Environment* env, v8::Local<v8::Object> target) {
  CHECK(target->Set(env->context(),
                  internal_bootstrap_loaders_key.ToStringChecked(env->isolate()),
                  internal_bootstrap_loaders_value.ToStringChecked(env->isolate())).FromJust());

这里主要是将各个模块的key、value挂载在exports对象中,可以在.cpp或者.js中取得文件内容进行执行等操作。

总结

本文主要介绍node_javascript.cc的产生和内容,这其实也是node中获取native模块最关键的地方。到此为止,已经介绍了node中builtin和native模块的由来,大家也可以和上一篇文章的中所提到的getBinding串起来了。

Koa实现原理简要分析

运行流程

1.new app()

首先koa中主要有app,context,response,request四个基类,我们在实例化app的时候,实际上就是初始化了一些app中的属性。

this.proxy = false;
this.middleware = [];
this.subdomainOffset = 2;
this.env = process.env.NODE_ENV || 'development';
this.context = Object.create(context);
this.request = Object.create(request);
this.response = Object.create(response);

2.app.listen()

在我们执行app.listen()时,主要做了createServer,中间件用compose串起来。

GitHub

这里面的compose是Koa洋葱圈模型的实现关键,后面会详细介绍

3.handle request

当请求来临时,koa首先 Object.create(this.context),创建一个我们常用ctx对象,然后执行中间件,最后respond(res.end(body)

GitHub

洋葱圈的实现

我们知道,koa中的比较重要的部分在于其中间件的挂载和执行,当请求到来时,中间件先顺序执行,再逆序执行,那么这在koa2.x中是如何实现的呢?

1.app.use()

首先当我们执行app.use时,将该中间件push到middleware队列中。

use(fn) {
    ......
    this.middleware.push(fn);
    return this;
 }

2.app.listen()

当执行listen时,执行const fn = compose(this.middleware);,compose执行返回一个函数fn,fn执行时,按队列中的顺序依次执行,传入参数ctx及next方法。

function compose (middleware) {
  if (!Array.isArray(middleware)) throw new TypeError('Middleware stack must be an array!')
  for (const fn of middleware) {
    if (typeof fn !== 'function') throw new TypeError('Middleware must be composed of functions!')
  }

  /**
   * @param {Object} context
   * @return {Promise}
   * @api public
   */

  return function (context, next) {
    // last called middleware #
    let index = -1
    return dispatch(0)
    function dispatch (i) {
      if (i <= index) return Promise.reject(new Error('next() called multiple times'))
      index = i
      let fn = middleware[i]
      if (i === middleware.length) fn = next
      if (!fn) return Promise.resolve()
      try {
        return Promise.resolve(fn(context, function next () {
          return dispatch(i + 1)
        }))
      } catch (err) {
        return Promise.reject(err)
      }
    }
  }
}

那么中间件为什么会从前向后执行,然后再从后向前执行呢?

首先,我们在写中间时会有await next()的用法(注意,await会等到后面的Promise resolve或reject后才厚向下继续执行),那么执行await next()就会转而执行dispatch(i + 1),直到最后一个中间件;当执行到最后一个再执行dispatch(i + 1)时,会触发if (!fn) return Promise.resolve(),最后一个中间件开始执行await next()后面的逻辑,完成后,执行倒数第二个,依次执行到第一个中间件。

注意,当中间件中有两处await next()时,会触发if (i <= index) return Promise.reject(new Error('next() called multiple times')),抛出错误。

context/request/response

三者的关系引用深入浅出koa中的一张图。

GitHub

其中,我们在使用ctx.body等属性或方法时,实际上调用的this.request.body等属性或方法,实际实现就是调用了delegate库,将request和response中一些常用属性和方法挂载到context对象上。

delegate(proto, 'response')
  .method('attachment')
  .method('redirect')
  .method('remove')
  .method('vary')
  .method('set')
  .method('append')
  .method('flushHeaders')
  .access('status')
  .access('message')
  .access('body')
  .access('length')
  .access('type')
  .access('lastModified')
  .access('etag')
  .getter('headerSent')
  .getter('writable');

/**
 * Request delegation.
 */

delegate(proto, 'request')
  .method('acceptsLanguages')
  .method('acceptsEncodings')
  .method('acceptsCharsets')
  .method('accepts')
  .method('get')
  .method('is')
  .access('querystring')
  .access('idempotent')
  .access('socket')
  .access('search')
  .access('method')
  .access('query')
  .access('path')
  .access('url')
  .getter('origin')
  .getter('href')
  .getter('subdomains')
  .getter('protocol')
  .getter('host')
  .getter('hostname')
  .getter('URL')
  .getter('header')
  .getter('headers')
  .getter('secure')
  .getter('stale')
  .getter('fresh')
  .getter('ips')
  .getter('ip');

中间件的书写

koa中间件实现起来比较简单,只要实现一个带有ctx和next参数的一个函数即可。以koa-body为例。随便看一个中间件就好了

Koa1.0中的洋葱圈实现

Koa1.0中的中间还没有await和async,而是用的yield来实现,yeild next如何做到上述的顺序执行然后逆序呢?我们下面简单回顾一下。

1.compose middleware

var fn = this.experimental
    ? compose_es7(this.middleware)
    : co.wrap(compose(this.middleware));//这里就是我们上面讲的compose()函数

2.co.wrap()

co.wrap = function (fn) {
  createPromise.__generatorFunction__ = fn;
  return createPromise;
  function createPromise() {
    return co.call(this, fn.apply(this, arguments));
  }
};

这里相当于调用了co()方法,把我们之前的compose()函数返回的结果函数作为参数传给了它。

3.co()——逆向执行关键

/**
 * slice() reference.
 */

var slice = Array.prototype.slice;

/**
 * Expose `co`.
 */

module.exports = co['default'] = co.co = co;

/**
 * Wrap the given generator `fn` into a
 * function that returns a promise.
 * This is a separate function so that
 * every `co()` call doesn't create a new,
 * unnecessary closure.
 *
 * @param {GeneratorFunction} fn
 * @return {Function}
 * @api public
 */

co.wrap = function (fn) {
  createPromise.__generatorFunction__ = fn;
  return createPromise;
  function createPromise() {
    return co.call(this, fn.apply(this, arguments));
  }
};

/**
 * Execute the generator function or a generator
 * and return a promise.
 *
 * @param {Function} fn
 * @return {Promise}
 * @api public
 */

function co(gen) {
  var ctx = this;
  var args = slice.call(arguments, 1);

  // we wrap everything in a promise to avoid promise chaining,
  // which leads to memory leak errors.
  // see https://github.com/tj/co/issues/180
  //返回promise
  return new Promise(function(resolve, reject) {
    if (typeof gen === 'function') gen = gen.apply(ctx, args);
    if (!gen || typeof gen.next !== 'function') return resolve(gen);

    onFulfilled();

    /**
     * @param {Mixed} res
     * @return {Promise}
     * @api private
     */

    // promise成功时调用
    // 调用resolve()时执行
    function onFulfilled(res) {
      var ret;
      try {
        // 调用gen.next,到达一个yield
        ret = gen.next(res);
      } catch (e) {
        return reject(e);
      }
      // 将gen.next()返回值传入next()函数
      next(ret);
      return null;
    }

    /**
     * @param {Error} err
     * @return {Promise}
     * @api private
     */

    function onRejected(err) {
      var ret;
      try {
        ret = gen.throw(err);
      } catch (e) {
        return reject(e);
      }
      next(ret);
    }

    /**
     * Get the next value in the generator,
     * return a promise.
     *
     * @param {Object} ret
     * @return {Promise}
     * @api private
     */

    function next(ret) {
      // 如果generator函数执行完毕,调用resolve,执行上述fullfilled函数
      // 并将ret.value传入
      if (ret.done) return resolve(ret.value);
      // 将ret.value转换成promise
      // 转换函数在下面
      var value = toPromise.call(ctx, ret.value);
      // 监听promise的成功/失败
      if (value && isPromise(value)) return value.then(onFulfilled, onRejected);
      return onRejected(new TypeError('You may only yield a function, promise, generator, array, or object, '
        + 'but the following object was passed: "' + String(ret.value) + '"'));
    }
  });
}

/**
 * Convert a `yield`ed value into a promise.
 *
 * @param {Mixed} obj
 * @return {Promise}
 * @api private
 */

function toPromise(obj) {
  if (!obj) return obj;
  if (isPromise(obj)) return obj;
  if (isGeneratorFunction(obj) || isGenerator(obj)) return co.call(this, obj);
  if ('function' == typeof obj) return thunkToPromise.call(this, obj);
  if (Array.isArray(obj)) return arrayToPromise.call(this, obj);
  if (isObject(obj)) return objectToPromise.call(this, obj);
  return obj;
}

/**
 * Convert a thunk to a promise.
 *
 * @param {Function}
 * @return {Promise}
 * @api private
 */

function thunkToPromise(fn) {
  var ctx = this;
  return new Promise(function (resolve, reject) {
    fn.call(ctx, function (err, res) {
      if (err) return reject(err);
      if (arguments.length > 2) res = slice.call(arguments, 1);
      resolve(res);
    });
  });
}

/**
 * Convert an array of "yieldables" to a promise.
 * Uses `Promise.all()` internally.
 *
 * @param {Array} obj
 * @return {Promise}
 * @api private
 */

function arrayToPromise(obj) {
  return Promise.all(obj.map(toPromise, this));
}

/**
 * Convert an object of "yieldables" to a promise.
 * Uses `Promise.all()` internally.
 *
 * @param {Object} obj
 * @return {Promise}
 * @api private
 */

function objectToPromise(obj){
  var results = new obj.constructor();
  var keys = Object.keys(obj);
  var promises = [];
  for (var i = 0; i < keys.length; i++) {
    var key = keys[i];
    var promise = toPromise.call(this, obj[key]);
    if (promise && isPromise(promise)) defer(promise, key);
    else results[key] = obj[key];
  }
  return Promise.all(promises).then(function () {
    return results;
  });

  function defer(promise, key) {
    // predefine the key in the result
    results[key] = undefined;
    promises.push(promise.then(function (res) {
      results[key] = res;
    }));
  }
}

/**
 * Check if `obj` is a promise.
 *
 * @param {Object} obj
 * @return {Boolean}
 * @api private
 */

function isPromise(obj) {
  return 'function' == typeof obj.then;
}

/**
 * Check if `obj` is a generator.
 *
 * @param {Mixed} obj
 * @return {Boolean}
 * @api private
 */

function isGenerator(obj) {
  return 'function' == typeof obj.next && 'function' == typeof obj.throw;
}

/**
 * Check if `obj` is a generator function.
 *
 * @param {Mixed} obj
 * @return {Boolean}
 * @api private
 */
 
function isGeneratorFunction(obj) {
  var constructor = obj.constructor;
  if (!constructor) return false;
  if ('GeneratorFunction' === constructor.name || 'GeneratorFunction' === constructor.displayName) return true;
  return isGenerator(constructor.prototype);
}

/**
 * Check for plain object.
 *
 * @param {Mixed} val
 * @return {Boolean}
 * @api private
 */

function isObject(val) {
  return Object == val.constructor;
}

注意,我们在写每个中间件时,实际都有yield next;onFulfilled这个函数只在两种情况下被调用,一种是调用co的时候执行,还有一种是当前promise中的所有逻辑都执行完毕后执行  

这里我们传入的fn是一个generator对象,根据上述转换函数,将会继续调用co()函数,执行next()时,我们传入的参数ret.val是下一个中间件的generator对象,所以继续调用co()函数,如此递归的执行下去;当到最后一个中间件时,执行完成后,ret.done==true,会再次调用resolve,返回到上一层中间件。

这个过程其实就是递归调用的过程。

编译原理扫盲帖

最近复习了一下编译原理,编译原理主要有以下几个阶段:

1.词法分析,将原文件中的字符分解为一个个独立的单词符号——TOKEN。词法分析的输入是字符流,输出的一个个词法单元(<类型, 值>)。
2.语法分析,分析程序的短语结构。语法分析从语法分析器输出的token中识别各类短语,并构造语法分析树。
3.语义分析,推算程序的含义。语义分析负责收集标志符的属性信息并存在符号表中;负责语义检查
4.中间代码生成
5.代码优化
6.目标代码生成

其中,语法分析、语义分析、中间代码生成三个阶段可以合为语法制导翻译。

本文将针对上述几个阶段进行简要介绍。

词法分析

语法分析器从左到右的扫描程序中的字符,识别出各个单词,并确定单词类型,输出统一的词法单元token。

我们在做词法分析器时,主要遵循以下几个步骤:

1.确定Token的分类,比如关键字、常量、运算符、标志符、空格、注释等
2.为每一类token确定相应的正则及匹配函数
3.在主流程中逐一匹配正则并削减剩余字符串

我们以sql-parser中的词法分析部分为例:

正则及匹配函数

WHITESPACE = /^[ \n\r]+/;

Lexer.prototype.whitespaceToken = function() {
      var match, newlines, partMatch;
      if (match = WHITESPACE.exec(this.chunk)) {
        partMatch = match[0];
        newlines = partMatch.replace(/[^\n]/, '').length;
        this.currentLine += newlines;
        if (this.preserveWhitespace) {
          return { name, value: partMatch }
        }
      }
};

主流程

while (this.chunk = sql.slice(i)) {
        token = this.keywordToken() || this.starToken() || this.booleanToken() || this.functionToken() || this.windowExtension() || this.sortOrderToken() || this.seperatorToken() || this.operatorToken() || this.mathToken() || this.dotToken() || this.conditionalToken() || this.betweenToken() || this.subSelectOpToken() || this.subSelectUnaryOpToken() || this.numberToken() || this.stringToken() || this.parameterToken() || this.parensToken() || this.whitespaceToken() || this.literalToken();
        if (token.length < 1) {
          throw new Error("NOTHING CONSUMED: Stopped at - '" + (this.chunk.slice(0, 30)) + "'");
        }

        this.tokens.push(token)
        i += token.value.length;
}

文法

文法用来描述语言的规则,文法G定义为一个四元组(VN,VT,P,S),其中,VN为非终结符集合,VT终结符集合;P是产生式结合;S称为识别符或开始符号,也是一个非终结符,至少要在一条产生式的左边出现。

产生式的形式是α → β,α称为产生式左部,β称为产生式右部,α属于VN,β∈(VN∪VT)*,α∉ε

上下文无关文法

文法分为上下文有关文法和上下文无关文法,故名思义,上下文无关文法就是匹配产生式时,与上下文(前后已经推倒出的结果)无关,上下文无关文法的产生式左侧只有非终结符。只要文法的定义里有某个产生式,不管一个非终结符前后的串是什么,就可以应用相应的产生式进行推导。

消除左递归

一个文法含有下列形式的产生式之一时:

1.A→Aβ,A∈VN,β∈V*
2.A→Bβ,B→Aα,A、B∈VN,α、β∈V*

则称该文法是左递归的。

左递归的产生式是无法做自顶向下分析语法分析的,所以需要我们消除左递归。消除直接左递归的方式是将其转换成右递归。比如产生式:

P -> Pa|b 

P表示的是ba{1,},那么我们可以将其转换成如下右递归,

P -> bP'
P' -> aP'|ε

消除左递归有一套通用的算法,算法如下:

从 i = 1 到 n {
    从 j = 1 到 i - 1 {
        设Aj -> d1 | d2 | ... | dk
        将所有规则 Ai -> Aj y换成
        Ai -> d1 y | d2 y | ... | dk y
        移除Ai规则中的直接左递归
    }
}

简单用Javascript实现了一个消除直接递归的函数:

function removeDirectLeftRecursion(grammar) {
    for (let i = 0; i < grammar.length; i++) {
        let left  = grammar[i].getLeft(),
            right = grammar[i].getRight();

        let continueFlag = true;
        for (let j = 0; j < right.length; j++) {
            if (left === right[j].charAt(0)) {
                continueFlag = false;
                break;
            }
        }

        if (continueFlag) continue;

        let newLeft  = `${left}'`;
        grammar.add(new Rule(newLeft));
        grammar.get(grammar.size()-1).add("~");

        let generated = [];
        for (let j = 0; j < right.length; j++) {
            if (left === right[j].charAt(0)) {
                grammar.get(grammar.size()-1).add(ss.substring(1) + newLeft);
            } else {
                generated.push(right[j] + newLeft)
            }
        }

        right.set(generated);
    }
}

语法分析

语法分析的目的是构造分析树,按照分析树的构造方向,可以将语法分析分成自顶向下和自底向上分析法两种,下面来分别介绍。

自顶向下

自顶向上是从分析树的顶部(根节点)向底部(叶)节点方向构造分析树。

每一步推导中,都需要做两个选择:

1.替换当前句型中的哪个非终结符
2.用该非终结符的哪个候选式进行替换。

针对第一个选择,有最左推导和最右推到,由于我们通常都是从左到右的遍历,所以通常使用最左推导。针对第二个选择,将在下面分析法中介绍。

自顶向下的分析法,对文法有一定的要求,可能需要做文法转换,比如消除左递归,这里不再赘述。

递归下降分析

递归下降由一组过程组成,每个非终结符都对应一个分析过程。该方法从起始非终结符S开始,递归的调用其他的非终结符的对应过程。如果S对应的过程恰好扫描了整个输入串,则成功的完成了递归分析。

这里针对第二个选择,当同一个非终结符对应多个产生式时,可以使用错误回溯或预测分析的方法。回溯的方法会挨个尝试非终结符的产生式,如果后面的解析发生错误,则尝试下一个,这种方法称之为回溯;预测分析通过向前看输入流的k个字符,决定应用的产生式,也就是LL(k)分析法。

预测分析法在每一步推导中根据当前句型的最左非终结符A和当前输入符号a,选择一个正确的A的产生式。对于预测分析法,需要计算非终结符的First集和Follow集,通过这两个集合,可以计算产生式的Select集(eg.SELECT(A -> aB))来帮助预测分析,通过每个产生式的SELECT集就可以构造预测分析表,预测最细最终就是通过预测分析表来决定选用哪个产生式的。预测分析表的例子如下:

基于回溯的递归下降分析法,每一个非终结符的助理过程大致如下:

function A(scanner) {
	// 选择A的某个产生式,A -> X1X2...XK
	for (i to k) {
		if (Xi为非终结符) {
			X1(scanner)
		} else if (Xi为终结符 && Xi == scanner.read()) {
			scanner.next()
		} else {
			//	发生错误
		}
	}
}

递归的预测分析法通过预测分析表,决定调用哪个过程。我们在这里假设非终结符A对应两个表达式,分别的SELECT集为{:}、{,}大致过程如下:

function A(scanner) {
	// 选择A的某个产生式,A -> X1X2...XK
	for (i to k) {
		if (Xi为非终结符) {
			if (scanner.read() == ':') {
				// 第一个产生式				
			} else if (scanner.read() == ';') {
				//第二个产生式
			}

		} else if (Xi为终结符 && Xi == scanner.read()) {
			scanner.next()
		} else {
			//	发生错误
		}
	}
}

非递归预测分析

非递归的预测分析又叫做表驱动的预测分析,结构如下:

主要由预测分析表、扫描器和一个栈组成。原理与树的深度优先遍历类似,将匹配的产生式入栈,当栈顶与当前的输入符号相同时,栈顶出栈,输入符号向后移一位。

算法的大致流程如下:

X = 栈顶符号

// 栈顶不为空
while (X == '$') {
	if (X为终结符) {
		if (X == scanner.read()) {
			stack.pop();
			scanner.next()
		} else {
			throw Error;
		}
	}
	
	// 需查预测分析表M
	if (X为非终结符) {
		if (!M[X, a]) {
			throw Error
		} else {
			// 有对应产生式
			stact.pop();
			// 将产生式中的符号从右向左一次入栈
		}
	}
} 

自底向上

自顶向上是从分析树的底部(叶节点)向顶部(根节点)方向构造分析树。

移入-归约分析

自底向上的分析通用框架是移入-归约分析。

移入-归约分析的过程如下:

1.移入,对输入串从左到右扫描,将若干个字符入栈,直到可以对符号串进行归约为止
2.归约,栈顶字符归约成某个产生式的左部
3.语法分析器不断循环上述部分,直到栈顶包含了文法开始符号并且输入串为空;当然还有一种情况是检测到了语法错误

这里有个选择是当可以归约时,是继续移入还是直接归约,确定移入还是归约需要向前查看k个输入符号来决定,这就是LR(k)分析。

由于合适的产生式(句柄)是逐步形成的,所以句柄识别情况是有“状态“的,LR分析法用以下方式描述状态:

S -> bBB
S -> .bBB
S -> b.BB
S -> bB.B
S -> bBB. 

LR分析器的结构如下图所示:

与LR分析器结构不同在于多了一个状态栈,用来描述当前的句柄状态;同时有动作转移表用来描述在某一状态下,遇到某一终结符或非终结符时的动作,该动作有可能是移入、归约、状态变化或成功。

LR分析表的结构如下图所示:

LR分析算法大致流程如下:

while(1) {
	if (ACTION[s, a] == st) {
		// 状态t入状态栈
		// a入符号栈
	} else if (ACTION[s, a] == 归约 A -> X1X2...Xk) {
		// 弹出栈顶k个符号
		// A入符号栈
		// GOTO[t, A]入状态栈
	} else if (ACTION[s, a] == success) break
	else { threo Error }
}

LR(0)分析法是就是不参考后续的输入字符,直接归约的分析法,LR(0)分析法使用的条件是不出现移进-归约和归约-归约冲突,也就是同一状态遇到相同输入时只有一种可选动作,没有歧义。虽然应用面比较小,但我们可以通过它来看LR分析表是如何构造的。

讲解LR分析表构造之前,先讲两个概念增广文法项目集闭包

增广文法就是在G中加上新开始符号S'和产生式 S' -> S 而得到的文法,该文法是为了保证接收器只有一个起点。

项目集闭包用来表示句柄分析的状态,是相同的句柄分析状态的集合。举例如下:

有了项目集闭包后,我们就可以以初始状态为起点,构造LR分析表。结果如下:

构造项目集闭包的大致过程如下:

// I 为某一项目集(状态)
// 返回项目集闭包
function clousure(I) {
	J = I
	
	for (J中每一项 A -> a.Bb) {
		for (文法中每个产生式 B -> xxx) {
			if (b -> xxx 不在J中) {
				// 将 b -> xxx 加入J中
			}
		}
	}
	
	return J;
}

构造后继项目集闭包(扩展整个项目集)的大致过程如下:

// I 为某一项目集(状态),X 为某一非终结符
function goto(I, X) {
	// 初始化J为空集
	for (I中每一项 A -> a.Xb) {
		// 将 A -> aX.b 加入J
	}
	
	return clousure(J)
}

以上clousure和goto方法我们可以得到文法的所有状态的集合(项目集),方法是从clousure({ S‘ -> S })开始,循环检查项目集中所有集合goto(I, X)是否在集合中,不在则加入,直到没有新的项目集加入到集合中为止。

有了上述的项目集闭包,构建LR(0)分析表的过程是循环遍历所有项目集中的所有项目,做如下判断:

1.移入,如果项目集i中有 A -> .aB && goto(Ii, a) == Ij, ACTION[i, a] = sj
2.状态变换,如果项目集i中有 A -> .B && goto(Ii, B) == Ij, GOTO[i, a] = j
3.归约,如果项目集i中有 A -> aB.,ACTION[i, a] = rj
4.成功,如果项目集i中有 S' -> S.,ACTION[i, $] = rj

LR(0)没有考虑分析的上下文环境,有时会出现冲突(移进-归约和归约-归约冲突),简单来说就是选择用哪个产生式归约,是移入还是归约。解决这个问题需要知道句柄归约的条件,需要向前看输入字符了,那么就引出了SLR、LR(1)分析法。

SLR分析法借助FOLLOW集来解决冲突,当然这就决定了冲突相关的非终结符的FOLLOW集不能存在交集,器对上述第三个过程进行了改造,设下一个许茹字符为x, 将归约,如果项目集i中有 A -> aB.,ACTION[i, a] = rj改为归约,如果项目集i中有 A -> aB. && x属于FOLLOW(A),ACTION[i, a] = rj

在某些情况下,仅仅根据FOLLOW集来解决冲突是不够的,在特定位置,A的后继字符应该是A的FOLLOW集的子集,FOLLOW集可以帮助我们排出错误选项,但无法具体得知真正遇到哪个后继符号时执行归约,这就引出了LR(1)。

LR(1)分析法的关键是得到项目集中每个项目的展望符,也就是后继终结符,当下一个输入字符正好与展望符相同时,说明可是对该项目执行归约操作。展望符是该项目的后继符号,如果存在项目<A -> a.BX, a> , 其中a是A -> a.Bx的展望符,还有项目B -> .b,那么项目B -> .b的展望符等于first(Xa)

语法制导翻译

语义分析不设计什么算法,只是分析文法对应的动作,完全可以嵌入在语法分析的算法中,其中语法分析、语义分析、中间代码生成可以合为语法制导翻译。

语法制导翻译为文法中每个定义一个属性,属性分为综合属性和继承属性,综合属性依赖于子节点,继承属性依赖于父节点或兄弟节点。SDT(语法制导方案)的文法如下:

语法分析过程中,计算综合属性可以在归约时计算,继承属性则在继承符号出现前执行,同时,在计算属性过程中还可以执行附加动作(比如注册符号表)。

语法制导翻译简单来说就是在语法分析过程中计算非终结符的属性、执行附加动作。

其中自顶向下的分析中,递归的方法比较简单,即每个非终结符处理函数多加一个继承属性的参数,在函数里面依照制导方案执行相应动作即可;非递归的方式则需要对符号栈进行扩展,加入属性栈,并且非终结符应在栈中具有两项(比如F和F.sync),其中F.sync代表F的综合属性,需要其子节点都计算完成后才能计算,F出栈时,F.sync会暂时留在栈中,知道计算完成后出栈。

要想在自底向上的语法分析中加入动作,首先需要替换表达式中间的语义动作,使所有语义动作位于产生式末尾。如下所示:

同时自底向上的语法分析中加入动作也需要扩展符号栈,加入属性栈。

总结

本文简要梳理了词法分析、语法分析、语法制导翻译的过程,可以看出其中关键的就在于自顶向下、自底向上的语法分析,而非递归的语法分析都是借助栈来完成的。

Parcel 源码解读

Version
parcel-bundler: 1.11.0

Parcel中主要包含上述类:

  • Bundler,打包逻辑的入口
  • Parser,Asset的注册表,根据文件后缀查找并创建对应的Asset类
  • Asset,文件资源类,负责自身资源处理、依赖收集等操作,同时记录着原始资源、打包结果等信息;HTMLAsset、JSAsset等资源的Asset继承自此基类
  • Bundle,打包输出文件类,它由多个资源(Asset)组成,会根据当前Bundle类的类型查找对应的打包器(从PackagerRegistry中获取),调用打包器的package方法将自身包含的Asset打包进目的文件;bundle可以有子bundle,当动态从该bundle导入文件的时候,或者导入一个其他类型资源的文件的时候会产生childBundles
  • PackagerRegistry,Packager注册表,根据资源类型(基本上是Bundle在调用,所以基本上是Bundle的类型,也可以说是对应Asset的类型)注册、获取打包器(Packager)
  • Packager,打包组合类,用于将各个Asset产生的结果打包进目标文件,比如JSPackager将类型为JS的Asset产生的内容,打包以Bundle.name为名字的文件中
  • HMRServer,热更服务,其中包含启动ws服务,触发update等方法
  • FSCache,缓存
  • Resolver,资源路径解析类,如何对代码中引入的各种相对路径的资源路径进行解析,从而找到该模块的绝对路径

它们直接的调用及继承关系如下:

  • Bundler作为打包的入口,其中包含有Parser、Bundle、HMRServer、FSCache、Resolver等类
  • 构建的第一阶段,Bundler类调用Parser类获取文件对应的Asset,然后调用对应Asset的process等方法,取得Asset树
  • 构建的第二阶段,Bundler类中实例化根Bundle(初始空bundle),根据第一阶段中Asset的依赖信息,构建Bundle树
  • 构建的第三阶段,调用根Bundle类中的package方法,根据Bundle树进行文件写入等操作
  • Asset和Packager为基类,对应类型的(HTML、JS等)类继承自此基类

打包流程

打包的整体过程就在Bundler.bundle()方法中,代码如下:

async bundle() {
    // If another bundle is already pending, wait for that one to finish and retry.
    if (this.pending) {
      return new Promise((resolve, reject) => {
        this.once('buildEnd', () => {
          this.bundle().then(resolve, reject);
        });
      });
    }

    ......

    logger.clear();
    logger.progress('Building...');

    try {
      // Start worker farm, watcher, etc. if needed
      await this.start();

      // Emit start event, after bundler is initialised
      this.emit('buildStart', this.entryFiles);

      // If this is the initial bundle, ensure the output directory exists, and resolve the main asset.
      if (isInitialBundle) {
        await fs.mkdirp(this.options.outDir);

        this.entryAssets = new Set();
        for (let entry of this.entryFiles) {
          try {
            let asset = await this.resolveAsset(entry);
            this.buildQueue.add(asset);
            this.entryAssets.add(asset);
          } catch (err) {
            throw new Error(
              `Cannot resolve entry "${entry}" from "${this.options.rootDir}"`
            );
          }
        }

        if (this.entryAssets.size === 0) {
          throw new Error('No entries found.');
        }

        initialised = true;
      }

      // Build the queued assets.
      let loadedAssets = await this.buildQueue.run();

      // The changed assets are any that don't have a parent bundle yet
      // plus the ones that were in the build queue.
      let changedAssets = [...this.findOrphanAssets(), ...loadedAssets];

      // Invalidate bundles
      for (let asset of this.loadedAssets.values()) {
        asset.invalidateBundle();
      }

      logger.progress(`Producing bundles...`);

      // Create a root bundle to hold all of the entry assets, and add them to the tree.
      this.mainBundle = new Bundle();
      for (let asset of this.entryAssets) {
        this.createBundleTree(asset, this.mainBundle);
      }

      // If there is only one child bundle, replace the root with that bundle.
      if (this.mainBundle.childBundles.size === 1) {
        this.mainBundle = Array.from(this.mainBundle.childBundles)[0];
      }

      // Generate the final bundle names, and replace references in the built assets.
      this.bundleNameMap = this.mainBundle.getBundleNameMap(
        this.options.contentHash
      );

      for (let asset of changedAssets) {
        asset.replaceBundleNames(this.bundleNameMap);
      }

      // Emit an HMR update if this is not the initial bundle.
      if (this.hmr && !isInitialBundle) {
        this.hmr.emitUpdate(changedAssets);
      }

      logger.progress(`Packaging...`);

      // Package everything up
      this.bundleHashes = await this.mainBundle.package(
        this,
        this.bundleHashes
      );

      ......
      
      return this.mainBundle;
    } catch (err) {
      
      ......
      
    } finally {
      this.pending = false;
      this.emit('buildEnd');

      // If not in watch mode, stop the worker farm so we don't keep the process running.
      if (!this.watcher && this.options.killWorkers) {
        await this.stop();
      }
    }
  }

这里主要做了如下几件事:

  • 准备工作,加载插件等
  • 根据入口文件及其依赖构建Asset Tree
  • 根据Asset Tree构建Bundle Tree
  • 根据Bundle Tree进行Package操作

下面我们一步一步的讲解:

准备工作

准备工作主要在Bundler.start()中,代码如下:

async start() {
    if (this.farm) {
      return;
    }

    await this.loadPlugins();

    if (!this.options.env) {
      await loadEnv(Path.join(this.options.rootDir, 'index'));
      this.options.env = process.env;
    }

    this.options.extensions = Object.assign({}, this.parser.extensions);
    this.options.bundleLoaders = this.bundleLoaders;

    if (this.options.watch) {
      this.watcher = new Watcher();
      // Wait for ready event for reliable testing on watcher
      if (process.env.NODE_ENV === 'test' && !this.watcher.ready) {
        await new Promise(resolve => this.watcher.once('ready', resolve));
      }
      this.watcher.on('change', this.onChange.bind(this));
    }

    if (this.options.hmr) {
      this.hmr = new HMRServer();
      this.options.hmrPort = await this.hmr.start(this.options);
    }

    this.farm = await WorkerFarm.getShared(this.options, {
      workerPath: require.resolve('./worker.js')
    });
  }

这里主要做了如下几件事

  • 加载Parcel插件
  • 监听文件变化(可选)
  • 启动HMR服务(可选)

加载Parcel插件

加载Parcel插件的代码如下:

async loadPlugins() {
    let relative = Path.join(this.options.rootDir, 'index');
    let pkg = await config.load(relative, ['package.json']);
    if (!pkg) {
      return;
    }

    try {
      let deps = Object.assign({}, pkg.dependencies, pkg.devDependencies);
      for (let dep in deps) {
        const pattern = /^(@.*\/)?parcel-plugin-.+/;
        if (pattern.test(dep)) {
          let plugin = await localRequire(dep, relative);
          await plugin(this);
        }
      }
    } catch (err) {
      logger.warn(err);
    }
  }

加载插件步骤如下:

  • 读取根目录上的package.json
  • 循环遍历dependencies和devDependencies
    • 查找其中满足parcel-plugin-格式的依赖
    • 调用localRequire方法进行加载,localRequire获取到文件路径并缓存,然后做require操作(如果没有安装该npm包,则会调用npm / yarm install进行安装)。localRequire可以说是一个代理模式,代理了对文件的访问
    • 执行插件

注意,这里的localRequire就是一个代理模式,中间加入了缓存机制,控制了模块的访问。

监听文件变化和HMR后面会进行介绍。

构建Asset Tree

构建Asset Tree的主要逻辑在Bundler.Bundle()方法中,代码如下:

// If this is the initial bundle, ensure the output directory exists, and resolve the main asset.
      if (isInitialBundle) {
        await fs.mkdirp(this.options.outDir);

        this.entryAssets = new Set();
        for (let entry of this.entryFiles) {
          try {
            let asset = await this.resolveAsset(entry);
            this.buildQueue.add(asset);
            this.entryAssets.add(asset);
          } catch (err) {
            throw new Error(
              `Cannot resolve entry "${entry}" from "${this.options.rootDir}"`
            );
          }
        }

        if (this.entryAssets.size === 0) {
          throw new Error('No entries found.');
        }

        initialised = true;
      }

      // Build the queued assets.
      let loadedAssets = await this.buildQueue.run();

这里主要做了如下几件事:

  • 遍历入口文件
    • 根据文件后缀获取到入口文件对应的Asset实例
    • 将Asset实例加入到buildQueue中
  • 执行buildQueue.run()

Asset类

首先说明下Asset,Asset是文件资源类,与文件保持一对一的关系,Asset基类代码如下:

class Asset {
  constructor(name, options) {
    this.id = null;
    this.name = name;
    this.basename = path.basename(this.name);
    this.relativeName = path
      .relative(options.rootDir, this.name)
      .replace(/\\/g, '/');
   	
   	......
   	
    this.contents = options.rendition ? options.rendition.value : null;
    this.ast = null;
    this.generated = null;
    
    ......
  }

  shouldInvalidate() {
    return false;
  }

  async loadIfNeeded() {
    if (this.contents == null) {
      this.contents = await this.load();
    }
  }

  async parseIfNeeded() {
    await this.loadIfNeeded();
    if (!this.ast) {
      this.ast = await this.parse(this.contents);
    }
  }

  async getDependencies() {
    if (
      this.options.rendition &&
      this.options.rendition.hasDependencies === false
    ) {
      return;
    }

    await this.loadIfNeeded();

    if (this.contents && this.mightHaveDependencies()) {
      await this.parseIfNeeded();
      await this.collectDependencies();
    }
  }

  addDependency(name, opts) {
    this.dependencies.set(name, Object.assign({name}, opts));
  }

  addURLDependency(url, from = this.name, opts) {
    if (!url || isURL(url)) {
      return url;
    }

    if (typeof from === 'object') {
      opts = from;
      from = this.name;
    }

    const parsed = URL.parse(url);
    let depName;
    let resolved;
    let dir = path.dirname(from);
    const filename = decodeURIComponent(parsed.pathname);

    if (filename[0] === '~' || filename[0] === '/') {
      if (dir === '.') {
        dir = this.options.rootDir;
      }
      depName = resolved = this.resolver.resolveFilename(filename, dir);
    } else {
      resolved = path.resolve(dir, filename);
      depName = './' + path.relative(path.dirname(this.name), resolved);
    }

    this.addDependency(depName, Object.assign({dynamic: true, resolved}, opts));

    parsed.pathname = this.options.parser
      .getAsset(resolved, this.options)
      .generateBundleName();

    return URL.format(parsed);
  }

  ......

  parse() {
    // do nothing by default
  }

  collectDependencies() {
    // do nothing by default
  }

  async pretransform() {
    // do nothing by default
  }

  async transform() {
    // do nothing by default
  }

  async generate() {
    return {
      [this.type]: this.contents
    };
  }

  async process() {
    // Generate the id for this asset, unless it has already been set.
    // We do this here rather than in the constructor to avoid unnecessary work in the main process.
    // In development, the id is just the relative path to the file, for easy debugging and performance.
    // In production, we use a short hash of the relative path.
    if (!this.id) {
      this.id =
        this.options.production || this.options.scopeHoist
          ? md5(this.relativeName, 'base64').slice(0, 4)
          : this.relativeName;
    }

    if (!this.generated) {
      await this.loadIfNeeded();
      await this.pretransform();
      await this.getDependencies();
      await this.transform();
      this.generated = await this.generate();
    }

    return this.generated;
  }

  ......
}

这里主要关注下process方法,也就是文件的文件资源的处理过程:

  • loadIfNeeded,加载文件内容
  • pretransform,预处理,比如js资源会用babel()进行转换
  • getDependencies, 这里主要对资源字符串进行解析,例如html字符串用posthtml-parser, js资源用babylon.parse来解析。然后收集依赖collectDependencies,具体操作稍后分析。
  • transform, 资源转换步骤接收 AST并对其进行遍历,在此过程中对节点进行添加、更新及移除等操作。
  • generate,产出一份处理后的文件内容,基本返回的数据格式是[this.type]: this.contents
  • generateHash,根据处理后的文件内容,产出对应hash值

注意,这里不同的子类会继承自此基类,实现基类暴露的接口,这其实就是针对接口编程的设计原则。

收集依赖的过程会在下面进行详细介绍。

Bundler.resolveAsset

根据文件后缀获取到入口文件对应的Asset实例的逻辑Bundler.resolveAsset中,代码如下:

  async resolveAsset(name, parent) {
    let {path} = await this.resolver.resolve(name, parent);
    return this.getLoadedAsset(path);
  }

  getLoadedAsset(path) {
    if (this.loadedAssets.has(path)) {
      return this.loadedAssets.get(path);
    }

    let asset = this.parser.getAsset(path, this.options);
    this.loadedAssets.set(path, asset);

    this.watch(path, asset);
    return asset;
  }

主要做了如下两件事:

  • 利用Resolver类,获取到文件的绝对路径
  • 利用Parser类,根据文件的后缀获取到Asset实例

这里简单说下Parser,Parser可以说是Asset的注册表,根据类型存储对应的Asset实例,parser.getAsset方法根据文件路径获取对应的Asset实例。

buildQueue.run

buildQueue是PromiseQueue的实例,PromiseQueue.run方法将对列中的内容一次通过process函数处理。PromiseQueue有兴趣大家可以去看下代码,这里不在赘述。

buildQueue的初始化代码在Bundler的constructor中,代码如下:

this.buildQueue = new PromiseQueue(this.processAsset.bind(this));

在我们上述的场景中,执行逻辑就是对所有的入口文件对应的Asset,执行Bundler.processAsset(Asset)

Bundler.processAsset()最终调用的是Bundler.loadAsset()方法,代码如下:

async loadAsset(asset) {
    ......
 
    if (!processed || asset.shouldInvalidate(processed.cacheData)) {
      processed = await this.farm.run(asset.name);
      cacheMiss = true;
    }

    ......

    // Call the delegate to get implicit dependencies
    let dependencies = processed.dependencies;
    if (this.delegate.getImplicitDependencies) {
      let implicitDeps = await this.delegate.getImplicitDependencies(asset);
      if (implicitDeps) {
        dependencies = dependencies.concat(implicitDeps);
      }
    }

    // Resolve and load asset dependencies
    let assetDeps = await Promise.all(
      dependencies.map(async dep => {
        if (dep.includedInParent) {
          // This dependency is already included in the parent's generated output,
          // so no need to load it. We map the name back to the parent asset so
          // that changing it triggers a recompile of the parent.
          this.watch(dep.name, asset);
        } else {
          dep.parent = asset.name;
          let assetDep = await this.resolveDep(asset, dep);
          if (assetDep) {
            await this.loadAsset(assetDep);
          }

          return assetDep;
        }
      })
    );

    // Store resolved assets in their original order
    dependencies.forEach((dep, i) => {
      asset.dependencies.set(dep.name, dep);
      let assetDep = assetDeps[i];
      if (assetDep) {
        asset.depAssets.set(dep, assetDep);
        dep.resolved = assetDep.name;
      }
    });

    logger.verbose(`Built ${asset.relativeName}...`);

    if (this.cache && cacheMiss) {
      this.cache.write(asset.name, processed);
    }
  }

这里主要做了如下几件事:

  • this.farm.run(asset.name),其实就是调用了/src/pipeline.js中Pipeline类的processAsset方法,执行asset.process()对asset进行处理
  • 对该Asset的依赖执行resolveDepthis.loadAsset(assetDep),获取依赖的asset
  • 将所有依赖的asset放在asset.depAssets中进行记录

到此为止,Asset的树结构已经构建完成,构建的过程就是一个递归的操作,对本身进行process,然后递归的对其依赖进行process,最终形成asset tree。

注意,有一些细节点后面会进行详细介绍,比如上述this.farm是子进程管理的实例,可以利用多进程加快构建的速度;收集依赖的过程会根据文件类型的不同而不同。

this.farm也是一个代理模式的应用。

构建Bundle Tree

构建Bundle Tree的主要逻辑也在Bundler.Bundle()中,代码如下:

// Create a root bundle to hold all of the entry assets, and add them to the tree.
this.mainBundle = new Bundle();
for (let asset of this.entryAssets) {
this.createBundleTree(asset, this.mainBundle);
}

这里主要做了如下几件事:

  • 创建一个根bundle
  • 利用this.createBundleTree方法将所有的入口asset加入到根bundle中

Bundle类

Bundle类是文件束的类,每个Bundle表示一个大包后的文件,其中包含子assets、childBundle等属性,代码如下:

class Bundle {
  constructor(type, name, parent, options = {}) {
    this.type = type;
    this.name = name;
    this.parentBundle = parent;
    this.entryAsset = null;
    this.assets = new Set();
    this.childBundles = new Set();
    this.siblingBundles = new Set();
    this.siblingBundlesMap = new Map();
    
    ......
   
  }

  static createWithAsset(asset, parentBundle, options) {
    let bundle = new Bundle(
      asset.type,
      Path.join(asset.options.outDir, asset.generateBundleName()),
      parentBundle,
      options
    );

    bundle.entryAsset = asset;
    bundle.addAsset(asset);
    return bundle;
  }

  addAsset(asset) {
    asset.bundles.add(this);
    this.assets.add(asset);
  }

  ......

  getSiblingBundle(type) {
    if (!type || type === this.type) {
      return this;
    }

    if (!this.siblingBundlesMap.has(type)) {
      let bundle = new Bundle(
        type,
        Path.join(
          Path.dirname(this.name),
          // keep the original extension for source map files, so we have
          // .js.map instead of just .map
          type === 'map'
            ? Path.basename(this.name) + '.' + type
            : Path.basename(this.name, Path.extname(this.name)) + '.' + type
        ),
        this
      );

      this.childBundles.add(bundle);
      this.siblingBundles.add(bundle);
      this.siblingBundlesMap.set(type, bundle);
    }

    return this.siblingBundlesMap.get(type);
  }

  createChildBundle(entryAsset, options = {}) {
    let bundle = Bundle.createWithAsset(entryAsset, this, options);
    this.childBundles.add(bundle);
    return bundle;
  }

  createSiblingBundle(entryAsset, options = {}) {
    let bundle = this.createChildBundle(entryAsset, options);
    this.siblingBundles.add(bundle);
    return bundle;
  }

  ......

  async package(bundler, oldHashes, newHashes = new Map()) {
    let promises = [];
    let mappings = [];

    if (!this.isEmpty) {
      let hash = this.getHash();
      newHashes.set(this.name, hash);

      if (!oldHashes || oldHashes.get(this.name) !== hash) {
        promises.push(this._package(bundler));
      }
    }

    for (let bundle of this.childBundles.values()) {
      if (bundle.type === 'map') {
        mappings.push(bundle);
      } else {
        promises.push(bundle.package(bundler, oldHashes, newHashes));
      }
    }

    await Promise.all(promises);
    for (let bundle of mappings) {
      await bundle.package(bundler, oldHashes, newHashes);
    }
    return newHashes;
  }

  async _package(bundler) {
    let Packager = bundler.packagers.get(this.type);
    let packager = new Packager(this, bundler);

    let startTime = Date.now();
    await packager.setup();
    await packager.start();

    let included = new Set();
    for (let asset of this.assets) {
      await this._addDeps(asset, packager, included);
    }

    await packager.end();

    this.totalSize = packager.getSize();

    let assetArray = Array.from(this.assets);
    let assetStartTime =
      this.type === 'map'
        ? 0
        : assetArray.sort((a, b) => a.startTime - b.startTime)[0].startTime;
    let assetEndTime =
      this.type === 'map'
        ? 0
        : assetArray.sort((a, b) => b.endTime - a.endTime)[0].endTime;
    let packagingTime = Date.now() - startTime;
    this.bundleTime = assetEndTime - assetStartTime + packagingTime;
  }

  async _addDeps(asset, packager, included) {
    if (!this.assets.has(asset) || included.has(asset)) {
      return;
    }

    included.add(asset);
    
    for (let depAsset of asset.depAssets.values()) {
      await this._addDeps(depAsset, packager, included);
    }

    await packager.addAsset(asset);

    const assetSize = packager.getSize() - this.totalSize;
    if (assetSize > 0) {
      this.addAssetSize(asset, assetSize);
    }
  }

  ......
}
  • Bundle具有assets、childBundle等属性,同时拥有addAsset方法来注册asset,createChildBundle方法用来创建子bundle来构建bundle tree。
  • Bundle出了具有构建bundle tree能力外,还有package方法,可以递归的调用bundle tree中各个bundle的package方法,进行打包操作

Bundler.createBundleTree

Bundler.createBundleTree()是创建Bundle tree的主要方法,其目的是将入口的asset加入到根bundle中,代码如下:

createBundleTree(asset, bundle, dep, parentBundles = new Set()) {
    if (dep) {
      asset.parentDeps.add(dep);
    }

    if (asset.parentBundle && !bundle.isolated) {
      // If the asset is already in a bundle, it is shared. Move it to the lowest common ancestor.
      if (asset.parentBundle !== bundle) {
        let commonBundle = bundle.findCommonAncestor(asset.parentBundle);

        // If the common bundle's type matches the asset's, move the asset to the common bundle.
        // Otherwise, proceed with adding the asset to the new bundle below.
        if (asset.parentBundle.type === commonBundle.type) {
          this.moveAssetToBundle(asset, commonBundle);
          return;
        }
      } else {
        return;
      }

      // Detect circular bundles
      if (parentBundles.has(asset.parentBundle)) {
        return;
      }
    }

    ......

    // If the asset generated a representation for the parent bundle type, and this
    // is not an async import, add it to the current bundle
    if (bundle.type && asset.generated[bundle.type] != null && !dep.dynamic) {
      bundle.addAsset(asset);
    }

    if ((dep && dep.dynamic) || !bundle.type) {
      // If the asset is already the entry asset of a bundle, don't create a duplicate.
      if (isEntryAsset) {
        return;
      }

      // Create a new bundle for dynamic imports
      bundle = bundle.createChildBundle(asset, dep);
    } else if (
      asset.type &&
      !this.packagers.get(asset.type).shouldAddAsset(bundle, asset)
    ) {
      // If the asset is already the entry asset of a bundle, don't create a duplicate.
      if (isEntryAsset) {
        return;
      }

      // No packager is available for this asset type, or the packager doesn't support
      // combining this asset into the bundle. Create a new bundle with only this asset.
      bundle = bundle.createSiblingBundle(asset, dep);
    } else {
      // Add the asset to the common bundle of the asset's type
      bundle.getSiblingBundle(asset.type).addAsset(asset);
    }

    // Add the asset to sibling bundles for each generated type
    if (asset.type && asset.generated[asset.type]) {
      for (let t in asset.generated) {
        if (asset.generated[t]) {
          bundle.getSiblingBundle(t).addAsset(asset);
        }
      }
    }

    asset.parentBundle = bundle;
    parentBundles.add(bundle);

    for (let [dep, assetDep] of asset.depAssets) {
      this.createBundleTree(assetDep, bundle, dep, parentBundles);
    }

    parentBundles.delete(bundle);
    return bundle;
  }

这里主要做了如下几件事:

  • 处理重复打包,如果重复则走另外一块逻辑,下面详细介绍
  • 如果bundle的类型在asset.generated中有对应项并且文件不是动态引入的,将asset加入到bundle的assets属性中
  • 如果文件是动态引入的或者是初始的根bundle(没有type),创建一个子bundle来容纳该asset,同时将当前bundle赋值为新创建的子bundle
  • 将asset.generated其他类型的产出加入到该bundle的兄弟bundle中
  • 遍历asset的依赖depAsset,递归的创建bundle tree,同时将当前bundle作为根bundle传入到Bundler.createBundleTree

这里需要注意的是如何判断是否重复打包呢?

 if (asset.parentBundle) {
   // If the asset is already in a bundle, it is shared. Move it to the lowest common ancestor.
   if (asset.parentBundle !== bundle) {
     let commonBundle = bundle.findCommonAncestor(asset.parentBundle);
     if (
       asset.parentBundle !== commonBundle &&
       asset.parentBundle.type === commonBundle.type
     ) {
       this.moveAssetToBundle(asset, commonBundle);
       return;
     }
   } else return;
 }
  • 如果一个资源的parentBundle已经存在但是不等于此次正在对它进行打包的bundle,那么将其转移到最近的公共父bundle中,避免一份代码重复的打包到了两份bundle中
  • 如果一个资源的parentBundle已经存在并且等于此次正在对它进行打包的bundle,说明他已经被打包过了,则直接跳过接下来的打包程序。

Package

打包(package)的入口逻辑Bundler.Bundle()中,代码如下:

// Package everything up
this.bundleHashes = await this.mainBundle.package(
	this,
	this.bundleHashes
);

这段代码就是调用了mainBundle.package方法,从根bundle开始进行打包

bundle.package

构建好bundle tree之后,从根bundle开始,递归的调用每个bundle的package方法,进行打包操作,Bundle.package()的代码如下:

async package(bundler, oldHashes, newHashes = new Map()) {
    let promises = [];
    let mappings = [];

    if (!this.isEmpty) {
      let hash = this.getHash();
      newHashes.set(this.name, hash);

      if (!oldHashes || oldHashes.get(this.name) !== hash) {
        promises.push(this._package(bundler));
      }
    }

    for (let bundle of this.childBundles.values()) {
      if (bundle.type === 'map') {
        mappings.push(bundle);
      } else {
        promises.push(bundle.package(bundler, oldHashes, newHashes));
      }
    }

    await Promise.all(promises);
    for (let bundle of mappings) {
      await bundle.package(bundler, oldHashes, newHashes);
    }
    return newHashes;
  }

这里主要做了如下几件事:

  • 获取bundle的hash值(利用bundle中包含的asset的hash值来获取),只有在旧的hash值不存在或者新的hash值不等于旧的hash值的时候,才进行package操作
  • 从根节点开始,递归的调用每个bundle的package方法进行打包操作
    • 根据bundle类型(打包文件类型)找到对应的打包资源处理类(Packager),然后调用Packager.addAsset(asset)方法将asset generate出的内容写入目标文件流
    • 每个bundle实例都会生成一个最终的打包文件

Packager

Packager根据bundle类型不同而有不同的Packager子类,使用者通过PackagerRegistry进行注册和获取某个类型的Packager。

基类代码如下:

class Packager {
  constructor(bundle, bundler) {
    this.bundle = bundle;
    this.bundler = bundler;
    this.options = bundler.options;
  }

  static shouldAddAsset() {
    return true;
  }

  async setup() {
    // Create sub-directories if needed
    if (this.bundle.name.includes(path.sep)) {
      await mkdirp(path.dirname(this.bundle.name));
    }

    this.dest = fs.createWriteStream(this.bundle.name);
    this.dest.write = promisify(this.dest.write.bind(this.dest));
    this.dest.end = promisify(this.dest.end.bind(this.dest));
  }

  async write(string) {
    await this.dest.write(string);
  }

  ......
}

我们主要关注其setupwrite方法即可,两个方法分别是创建文件写流、向文件中写入字符串。

子类的话我们以JSPackager为例,代码如下:

class JSPackager extends Packager {
  async start() {
    this.first = true;
    this.dedupe = new Map();
    this.bundleLoaders = new Set();
    this.externalModules = new Set();

    let preludeCode = this.options.minify ? prelude.minified : prelude.source;
    if (this.options.target === 'electron') {
      preludeCode =
        `process.env.HMR_PORT=${
          this.options.hmrPort
        };process.env.HMR_HOSTNAME=${JSON.stringify(
          this.options.hmrHostname
        )};` + preludeCode;
    }
    await this.write(preludeCode + '({');
    this.lineOffset = lineCounter(preludeCode);
  }

  async addAsset(asset) {
    // If this module is referenced by another JS bundle, it needs to be exposed externally.
    // In that case, don't dedupe the asset as it would affect the module ids that are referenced by other bundles.
    let isExposed = !Array.from(asset.parentDeps).every(dep => {
      let depAsset = this.bundler.loadedAssets.get(dep.parent);
      return this.bundle.assets.has(depAsset) || depAsset.type !== 'js';
    });

    if (!isExposed) {
      let key = this.dedupeKey(asset);
      if (this.dedupe.has(key)) {
        return;
      }

      // Don't dedupe when HMR is turned on since it messes with the asset ids
      if (!this.options.hmr) {
        this.dedupe.set(key, asset.id);
      }
    }

    ......

    this.bundle.addOffset(asset, this.lineOffset);
    await this.writeModule(
      asset.id,
      asset.generated.js,
      deps,
      asset.generated.map
    );
  }

  ......
  
  async end() {
    let entry = [];

    // Add the HMR runtime if needed.
    if (this.options.hmr) {
      let asset = await this.bundler.getAsset(
        require.resolve('../builtins/hmr-runtime')
      );
      await this.addAssetToBundle(asset);
      entry.push(asset.id);
    }

    if (await this.writeBundleLoaders()) {
      entry.push(0);
    }

    if (this.bundle.entryAsset && this.externalModules.size === 0) {
      entry.push(this.bundle.entryAsset.id);
    }

    await this.write(
      '},{},' +
        JSON.stringify(entry) +
        ', ' +
        JSON.stringify(this.options.global || null) +
        ')'
    );
    if (this.options.sourceMaps) {
      // Add source map url if a map bundle exists
      let mapBundle = this.bundle.siblingBundlesMap.get('map');
      if (mapBundle) {
        let mapUrl = urlJoin(
          this.options.publicURL,
          path.basename(mapBundle.name)
        );
        await this.write(`\n//# sourceMappingURL=${mapUrl}`);
      }
    }
    await super.end();
  }
}

这里主要关注上述几个方法:

  • start,将预设的前端模块加载器(后面会详述)代码写入目标文件
  • addAsset,将asset.generated.js及其依赖模块的id按模块加载器所需格式写入目标文件
  • end,将hmr所需的客户端代码和sourceMaps url写入目标文件,对于动态引入的模块,需要把响应的loader注册代码写入文件。

周边技术点

如何收集依赖

我们在上述的Asset处理时,有一个步骤是收集依赖(collectDependencies),这个步骤根据不同的文件类型处理方式会有不同,我们下面以JSAsset为例讲解一下。

  1. 首先在pretransform阶段中,JSAsset利用@babel/core生成ast,代码在/transforms/babel/babel7.js中,
  let res;
  if (asset.ast) {
    res = babel.transformFromAst(asset.ast, asset.contents, config);
  } else {
    res = babel.transformSync(asset.contents, config);
  }

  if (res.ast) {
    asset.ast = res.ast;
    asset.isAstDirty = true;
  }
  1. 遍历AST中的每个节点,收集依赖

遍历AST的过程由babylon-walk进行控制,代码如下:

const walk = require('babylon-walk');

collectDependencies() {
    walk.ancestor(this.ast, collectDependencies, this);
}

其中collectDependencies对应的是babel visitors,简单来说,在遇到某类型的节点时,就会触发某类型的visitors,我们可以控制进入节点或退出节点的处理逻辑。

在看用于收集依赖的visitor之前,先了解下ES6 module和nodejs的模块系统的几种导入导出方式以及对应在抽象语法树中代表的declaration类型:

// ImportDeclaration
import { stat, exists, readFile } from 'fs';

// ExportNamedDeclaration with node.source = null;
export var year = 1958;

// ExportNamedDeclaration with node.source = null;
export default function () {
  console.log('foo');
}

// ExportNamedDeclaration with node.source.value = 'my_module';
export { foo, bar } from 'my_module';

// CallExpression with node.Callee.name is require;
// CallExpression with node.Callee.arguments[0] is the 'react';
import('react').then(...)

// CallExpression with node.Callee.name is require;
// CallExpression with node.Callee.arguments[0] is the 'react';
var react = require('react');

除了上述这些依赖引入方式之外,还有两种比较特殊的方式:

// web Worker
new Worker('sw.js')

// service worker
if ('serviceWorker' in navigator) {
  navigator.serviceWorker.register('/sw-test/sw.js', { scope: '/sw-test/' }).then(function(reg) {
    // registration worked
    console.log('Registration succeeded. Scope is ' + reg.scope);
  }).catch(function(error) {
    // registration failed
    console.log('Registration failed with ' + error);
  });
}

下面我们正式来看collectDependencies对应的viditors,代码如下:

module.exports = {
  ImportDeclaration(node, asset) {
    asset.isES6Module = true;
    addDependency(asset, node.source);
  },

  ExportNamedDeclaration(node, asset) {
    asset.isES6Module = true;
    if (node.source) {
      addDependency(asset, node.source);
    }
  },

  ExportAllDeclaration(node, asset) {
    asset.isES6Module = true;
    addDependency(asset, node.source);
  },

  ExportDefaultDeclaration(node, asset) {
    asset.isES6Module = true;
  },

  CallExpression(node, asset) {
    let {callee, arguments: args} = node;

    let isRequire =
      types.isIdentifier(callee) &&
      callee.name === 'require' &&
      args.length === 1 &&
      types.isStringLiteral(args[0]);

    if (isRequire) {
      addDependency(asset, args[0]);
      return;
    }

    let isDynamicImport =
      callee.type === 'Import' &&
      args.length === 1 &&
      types.isStringLiteral(args[0]);

    if (isDynamicImport) {
      asset.addDependency('_bundle_loader');
      addDependency(asset, args[0], {dynamic: true});

      node.callee = requireTemplate().expression;
      node.arguments[0] = argTemplate({MODULE: args[0]}).expression;
      asset.isAstDirty = true;
      return;
    }

    const isRegisterServiceWorker =
      types.isStringLiteral(args[0]) &&
      matchesPattern(callee, serviceWorkerPattern);

    if (isRegisterServiceWorker) {
      addURLDependency(asset, args[0]);
      return;
    }
  },

  NewExpression(node, asset) {
    const {callee, arguments: args} = node;

    const isWebWorker =
      callee.type === 'Identifier' &&
      callee.name === 'Worker' &&
      args.length === 1 &&
      types.isStringLiteral(args[0]);

    if (isWebWorker) {
      addURLDependency(asset, args[0]);
      return;
    }
  }
};

我们可以看到,每次遇到引入模块,就会调用addDependency,这里对动态引入(import())的处理稍微特殊一点,我们下面会详细介绍。

前端模块加载器

我们先来看一下构建好的js bundle的内容:

// modules are defined as an array
// [ module function, map of requires ]
//
// map of requires is short require name -> numeric require
//
// anything defined in a previous bundle is accessed via the
// orig method which is the require for previous bundles

// eslint-disable-next-line no-global-assign
parcelRequire = (function (modules, cache, entry, globalName) {
  // Save the require from previous bundle to this closure if any
  var previousRequire = typeof parcelRequire === 'function' && parcelRequire;
  var nodeRequire = typeof require === 'function' && require;

  function newRequire(name, jumped) {
    if (!cache[name]) {
      if (!modules[name]) {
        // if we cannot find the module within our internal map or
        // cache jump to the current global require ie. the last bundle
        // that was added to the page.
        var currentRequire = typeof parcelRequire === 'function' && parcelRequire;
        if (!jumped && currentRequire) {
          return currentRequire(name, true);
        }

        // If there are other bundles on this page the require from the
        // previous one is saved to 'previousRequire'. Repeat this as
        // many times as there are bundles until the module is found or
        // we exhaust the require chain.
        if (previousRequire) {
          return previousRequire(name, true);
        }

        // Try the node require function if it exists.
        if (nodeRequire && typeof name === 'string') {
          return nodeRequire(name);
        }

        var err = new Error('Cannot find module \'' + name + '\'');
        err.code = 'MODULE_NOT_FOUND';
        throw err;
      }

      localRequire.resolve = resolve;
      localRequire.cache = {};

      var module = cache[name] = new newRequire.Module(name);

      modules[name][0].call(module.exports, localRequire, module, module.exports, this);
    }

    return cache[name].exports;

    function localRequire(x){
      return newRequire(localRequire.resolve(x));
    }

    function resolve(x){
      return modules[name][1][x] || x;
    }
  }

  function Module(moduleName) {
    this.id = moduleName;
    this.bundle = newRequire;
    this.exports = {};
  }

  newRequire.isParcelRequire = true;
  newRequire.Module = Module;
  newRequire.modules = modules;
  newRequire.cache = cache;
  newRequire.parent = previousRequire;
  newRequire.register = function (id, exports) {
    modules[id] = [function (require, module) {
      module.exports = exports;
    }, {}];
  };

  for (var i = 0; i < entry.length; i++) {
    newRequire(entry[i]);
  }

  if (entry.length) {
    // Expose entry point to Node, AMD or browser globals
    // Based on https://github.com/ForbesLindesay/umd/blob/master/template.js
    var mainExports = newRequire(entry[entry.length - 1]);

    // CommonJS
    if (typeof exports === "object" && typeof module !== "undefined") {
      module.exports = mainExports;

    // RequireJS
    } else if (typeof define === "function" && define.amd) {
     define(function () {
       return mainExports;
     });

    // <script>
    } else if (globalName) {
      this[globalName] = mainExports;
    }
  }

  // Override the current require with this new one
  return newRequire;
})({"a.js":[function(require,module,exports) {
var name = 'tsy'; // console.log(Buffer);

module.exports = name;
},{}],"index.js":[function(require,module,exports) {
var a = require('./a.js');

console.log(a);
},{"./a.js":"a.js"}]},{},["index.js"], null)
//# sourceMappingURL=/parcel-demo.e31bb0bc.js.map

我们可以看到这是一个立即执行的函数,参数有modulescacheentryglobalName

  • modules为当前bandle中包含的所有模块,也就是上面提到的Bundle类中的assets,modules的类型为一个对象,key是模块名称,value是一个数组,数组第一项为包装过的模块内容,第二项是依赖的模块信息。比如如下内容
{"a.js":[function(require,module,exports) {
var name = 'tsy'; // console.log(Buffer);

module.exports = name;
},{}],"index.js":[function(require,module,exports) {
var a = require('./a.js');

console.log(a);
},{"./a.js":"a.js"}]}
  • entry为该bundle的入口文件

下面我们来看下该立即执行函数的主要逻辑是便利入口文件,调用newRequire方法:

function newRequire(name, jumped) {
    if (!cache[name]) {
      if (!modules[name]) {
        // if we cannot find the module within our internal map or
        // cache jump to the current global require ie. the last bundle
        // that was added to the page.
        var currentRequire = typeof parcelRequire === 'function' && parcelRequire;
        if (!jumped && currentRequire) {
          return currentRequire(name, true);
        }

        // If there are other bundles on this page the require from the
        // previous one is saved to 'previousRequire'. Repeat this as
        // many times as there are bundles until the module is found or
        // we exhaust the require chain.
        if (previousRequire) {
          return previousRequire(name, true);
        }

        // Try the node require function if it exists.
        if (nodeRequire && typeof name === 'string') {
          return nodeRequire(name);
        }

        var err = new Error('Cannot find module \'' + name + '\'');
        err.code = 'MODULE_NOT_FOUND';
        throw err;
      }

      localRequire.resolve = resolve;
      localRequire.cache = {};

      var module = cache[name] = new newRequire.Module(name);

      modules[name][0].call(module.exports, localRequire, module, module.exports, this);
    }

    return cache[name].exports;

    function localRequire(x){
      return newRequire(localRequire.resolve(x));
    }

    function resolve(x){
      return modules[name][1][x] || x;
    }
}

function Module(moduleName) {
    this.id = moduleName;
    this.bundle = newRequire;
    this.exports = {};
}

每一个文件就是一个模块,在每个模块中,都会有一个module对象,这个对象就指向当前的模块。Parcel中的module对象具有以下属性:

  • id:当前模块的名称
  • bundle:newRequire方法
  • exports:当前模块暴露给外部的值

newRequire方法的逻辑如下:

  • 判断模块对象是否已被缓存
    • 如果是,直接return cache[name].exports
    • 如果没有,判断modules[name]是否存在
      • 如果存在,调用var module = cache[name] = new newRequire.Module(name); modules[name][0].call(module.exports, localRequire, module, module.exports, this);,缓存模块对象,并执行该模块
      • 如果不存在,则一次尝试调用其他bundle的parcelRequire(previousRequire)、node的require

在执行模块时,会将localRequire, module, module.exports作为形参,我们在模块中可以直接使用的requiremoduleexports即为执行该模块时传入的对应参数。

总结一下,我们利用函数把一个个模块封装起来,并给其提供 require和exports 的接口和一套模块规范,这样在不支持模块机制的浏览器环境中,我们也能够不去污染全局变量,体验到模块化带来的优势。

动态引入

我们接着来看动态引入,在上面JSAsset的collectDependencies中,已经有所提及。

我们首先看下在js遍历节点的过程中,遇到动态引入的情况如何处理:

if (isDynamicImport) {
  asset.addDependency('_bundle_loader');

  addDependency(asset, args[0], {dynamic: true});

  node.callee = requireTemplate().expression;
  node.arguments[0] = argTemplate({MODULE: args[0]}).expression;
  asset.isAstDirty = true;
  return;
}

这里我们可以看出,如果碰到Import()导入的资源, 直接将_bundle_loader加入其依赖列表,同时对表达式进行处理。根据上面代码,在ast中如果遇到import('./a.js')这段动态引入的代码, 会被直接替换为require('_bundle_loader')(require.resolve('./a.js'))

这里插一段背景,这种动态资源由于设置了dynamic: true,在后见bundle tree的时候,会单独生成一个bundle作为当前bundle的child bundle,同时在当前bundle中记录动态资源的信息。最后在当前的bundle中得到的打包资源数组,比如[md5(dynamicAsset).js, md5(cssWithDynamicAsset).css, ..., assetId], 由打包之后的文件名和该模块的id所组成.

根据上述前端模块加载器部分的介绍,require.resolve('./a.js')实际上获取的是./a.js模块的id,代码如下:

function resolve(x){
   return modules[name][1][x] || x;
}

_bundle_loader是Parcel-bundler的内置模块,位于/src/builtins/bundle-loader.js中,代码如下:

var getBundleURL = require('./bundle-url').getBundleURL;

function loadBundlesLazy(bundles) {
  if (!Array.isArray(bundles)) {
    bundles = [bundles]
  }

  var id = bundles[bundles.length - 1];

  try {
    return Promise.resolve(require(id));
  } catch (err) {
    if (err.code === 'MODULE_NOT_FOUND') {
      return new LazyPromise(function (resolve, reject) {
        loadBundles(bundles.slice(0, -1))
          .then(function () {
            return require(id);
          })
          .then(resolve, reject);
      });
    }

    throw err;
  }
}

function loadBundles(bundles) {
  return Promise.all(bundles.map(loadBundle));
}

var bundleLoaders = {};
function registerBundleLoader(type, loader) {
  bundleLoaders[type] = loader;
}

module.exports = exports = loadBundlesLazy;
exports.load = loadBundles;
exports.register = registerBundleLoader;

var bundles = {};
function loadBundle(bundle) {
  var id;
  if (Array.isArray(bundle)) {
    id = bundle[1];
    bundle = bundle[0];
  }

  if (bundles[bundle]) {
    return bundles[bundle];
  }

  var type = (bundle.substring(bundle.lastIndexOf('.') + 1, bundle.length) || bundle).toLowerCase();
  var bundleLoader = bundleLoaders[type];
  if (bundleLoader) {
    return bundles[bundle] = bundleLoader(getBundleURL() + bundle)
      .then(function (resolved) {
        if (resolved) {
          module.bundle.register(id, resolved);
        }

        return resolved;
      }).catch(function(e) {
        delete bundles[bundle];
        
        throw e;
      });
  }
}

function LazyPromise(executor) {
  this.executor = executor;
  this.promise = null;
}

LazyPromise.prototype.then = function (onSuccess, onError) {
  if (this.promise === null) this.promise = new Promise(this.executor)
  return this.promise.then(onSuccess, onError)
};

LazyPromise.prototype.catch = function (onError) {
  if (this.promise === null) this.promise = new Promise(this.executor)
  return this.promise.catch(onError)
};

其中loadBundlesLazy方法首先直接去require模块,如果没有的话,调用loadBundles加载后再去require。

loadBundles方法对每个模块调用loadBundle方法,loadBundle根据bundle类型获取相应的loader动态加载对应的bundle(被动态引入的模块会作为一个新的bundle),加载完成后注册到该bundle的modules中,这样后面的require就可以利用modules[name]获取到该模块了。

bundler loader在上述bundle的package.end()中将注册bundler loader的逻辑写入bundle,代码如下(JSPackager为例):

// Generate a module to register the bundle loaders that are needed
    let loads = 'var b=require(' + JSON.stringify(bundleLoader.id) + ');';
    for (let bundleType of this.bundleLoaders) {
      let loader = this.options.bundleLoaders[bundleType];
      if (loader) {
        let target = this.options.target === 'node' ? 'node' : 'browser';
        let asset = await this.bundler.getAsset(loader[target]);
        await this.addAssetToBundle(asset);
        loads +=
          'b.register(' +
          JSON.stringify(bundleType) +
          ',require(' +
          JSON.stringify(asset.id) +
          '));';
      }
    }

这段代码最终会在在modules中加入:

0:[function(require,module,exports) {
var b=require("../parcel/packages/core/parcel-bundler/src/builtins/bundle-loader.js");b.register("js",require("../parcel/packages/core/parcel-bundler/src/builtins/loaders/browser/js-loader.js"));
},{}]

同时将0这个模块加入到bundle的入口中(开始就会执行),这样在loadBundle就可以获取到对应的loader用于动态加载模块,以js-loader为例:

module.exports = function loadJSBundle(bundle) {
  return new Promise(function (resolve, reject) {
    var script = document.createElement('script');
    script.async = true;
    script.type = 'text/javascript';
    script.charset = 'utf-8';
    script.src = bundle;
    script.onerror = function (e) {
      script.onerror = script.onload = null;
      reject(e);
    };

    script.onload = function () {
      script.onerror = script.onload = null;
      resolve();
    };

    document.getElementsByTagName('head')[0].appendChild(script);
  });
};

在加载完资源后,我们又利用了module.bundle.register(id, resolved);注册到当前bundle的modules中,注册的代码在前端模块加载那里已经提及,代码如下:

newRequire.register = function (id, exports) {
    modules[id] = [function (require, module) {
      module.exports = exports;
    }, {}];
};

这样,我们利用require就可以直接获取到动态加载的资源了。

Worker

Parcel利用子进程来加快构建Asset Tree的速度,特别是编译生成AST的阶段。其最终调用的是node的child_process,但前面还有一些进程管理的工作,我们下面来探究一下。

worker在/src/bundler.js中load asset(this.farm.run())时使用,在start中被定义,我们来看下如何定义:

this.farm = await WorkerFarm.getShared(this.options, {
      workerPath: require.resolve('./worker.js')
});

这里传入了一些配置参数和workerPath,workerPath对应的模块中实现了initrun接口,后面在worker中会被使用,这也是面向接口编程的体现。

worker主要的代码在@parcel/workers中,worker中重要有三个类,WorkerFarmWorkerChild

  • WorkerFarm是worker的入口,用来管理所有的子进程
  • Worker类用来管理单个子进程,具有fork、回调处理等能力
  • Child为子进程中执行的模块,在其中通过IPC 通信信道来接受父进程发送的命令,执行对应对应模块的方法,我们这里就是执行./worker.js中的对应方法,执行后通过信道将结果传递给父进程Worker

这里的父进程向子进程发送命令,应用了设计模式中的命令模式

监听文件变化

监听文件变化同样是根据子进程对文件进行监听,但这里的子进程管理就比较简单了,创建一个子进程,然后发动命令就可以了,子进程中通过chokidar对文件进行监听,如果发现文件变化,发送消息给父进程,父进程出发相应的事件。

handleEmit(event, data) {
    if (event === 'watcherError') {
      data = errorUtils.jsonToError(data);
    }

    this.emit(event, data);
}

HMR

HMR通过WebSocket来实现,具有服务端和客户端两部分逻辑。

服务端逻辑(/src/HMRServer.js):

class HMRServer {
  async start(options = {}) {
    await new Promise(async resolve => {
      if (!options.https) {
        this.server = http.createServer();
      } else if (typeof options.https === 'boolean') {
        this.server = https.createServer(generateCertificate(options));
      } else {
        this.server = https.createServer(await getCertificate(options.https));
      }

      let websocketOptions = {
        server: this.server
      };

      if (options.hmrHostname) {
        websocketOptions.origin = `${options.https ? 'https' : 'http'}://${
          options.hmrHostname
        }`;
      }

      this.wss = new WebSocket.Server(websocketOptions);
      this.server.listen(options.hmrPort, resolve);
    });

    this.wss.on('connection', ws => {
      ws.onerror = this.handleSocketError;
      if (this.unresolvedError) {
        ws.send(JSON.stringify(this.unresolvedError));
      }
    });

    this.wss.on('error', this.handleSocketError);

    return this.wss._server.address().port;
  }

  ......

  emitUpdate(assets) {
    if (this.unresolvedError) {
      this.unresolvedError = null;
      this.broadcast({
        type: 'error-resolved'
      });
    }

    const shouldReload = assets.some(asset => asset.hmrPageReload);
    if (shouldReload) {
      this.broadcast({
        type: 'reload'
      });
    } else {
      this.broadcast({
        type: 'update',
        assets: assets.map(asset => {
          let deps = {};
          for (let [dep, depAsset] of asset.depAssets) {
            deps[dep.name] = depAsset.id;
          }

          return {
            id: asset.id,
            generated: asset.generated,
            deps: deps
          };
        })
      });
    }
  }

  ......

  broadcast(msg) {
    const json = JSON.stringify(msg);
    for (let ws of this.wss.clients) {
      ws.send(json);
    }
  }
}

这里的start方法用来创建WebSocket server,当有asset更新时,触发emitUpdate将asset id、asset 内容发送给客户端。

客户端逻辑:

var OVERLAY_ID = '__parcel__error__overlay__';

var OldModule = module.bundle.Module;

function Module(moduleName) {
  OldModule.call(this, moduleName);
  this.hot = {
    data: module.bundle.hotData,
    _acceptCallbacks: [],
    _disposeCallbacks: [],
    accept: function (fn) {
      this._acceptCallbacks.push(fn || function () {});
    },
    dispose: function (fn) {
      this._disposeCallbacks.push(fn);
    }
  };

  module.bundle.hotData = null;
}

module.bundle.Module = Module;

var parent = module.bundle.parent;
if ((!parent || !parent.isParcelRequire) && typeof WebSocket !== 'undefined') {
  var hostname = process.env.HMR_HOSTNAME || location.hostname;
  var protocol = location.protocol === 'https:' ? 'wss' : 'ws';
  var ws = new WebSocket(protocol + '://' + hostname + ':' + process.env.HMR_PORT + '/');
  ws.onmessage = function(event) {
    var data = JSON.parse(event.data);

    if (data.type === 'update') {
      console.clear();

      data.assets.forEach(function (asset) {
        hmrApply(global.parcelRequire, asset);
      });

      data.assets.forEach(function (asset) {
        if (!asset.isNew) {
          hmrAccept(global.parcelRequire, asset.id);
        }
      });
    }

    if (data.type === 'reload') {
      ws.close();
      ws.onclose = function () {
        location.reload();
      }
    }

    if (data.type === 'error-resolved') {
      console.log('[parcel] ✨ Error resolved');

      removeErrorOverlay();
    }

    if (data.type === 'error') {
      console.error('[parcel] 🚨  ' + data.error.message + '\n' + data.error.stack);

      removeErrorOverlay();

      var overlay = createErrorOverlay(data);
      document.body.appendChild(overlay);
    }
  };
}

......

function hmrApply(bundle, asset) {
  var modules = bundle.modules;
  if (!modules) {
    return;
  }

  if (modules[asset.id] || !bundle.parent) {
    var fn = new Function('require', 'module', 'exports', asset.generated.js);
    asset.isNew = !modules[asset.id];
    modules[asset.id] = [fn, asset.deps];
  } else if (bundle.parent) {
    hmrApply(bundle.parent, asset);
  }
}

function hmrAccept(bundle, id) {
  var modules = bundle.modules;
  if (!modules) {
    return;
  }

  if (!modules[id] && bundle.parent) {
    return hmrAccept(bundle.parent, id);
  }

  var cached = bundle.cache[id];
  bundle.hotData = {};
  if (cached) {
    cached.hot.data = bundle.hotData;
  }

  if (cached && cached.hot && cached.hot._disposeCallbacks.length) {
    cached.hot._disposeCallbacks.forEach(function (cb) {
      cb(bundle.hotData);
    });
  }

  delete bundle.cache[id];
  bundle(id);

  cached = bundle.cache[id];
  if (cached && cached.hot && cached.hot._acceptCallbacks.length) {
    cached.hot._acceptCallbacks.forEach(function (cb) {
      cb();
    });
    return true;
  }

  return getParents(global.parcelRequire, id).some(function (id) {
    return hmrAccept(global.parcelRequire, id)
  });
}

这里主要创建了Websocket Client,监听update消息,如果有,则替换modules中的对应内容,同时利用global.parcelRequire重新执行模块。

Node.js源码-net.createServer & net.createConnection & 'data' 事件

有了前面介绍的事件循环等章节,相信大家再看Node.js中其他模块的实现就会轻松许多。本章将带着大家过一遍net有关的实现。主要包括net.createServernet.createConnection、 connect事件、 data事件。

net.createServer

我们在使用过程中,通常情况下以如下方式使用:

const net = require('net');
const server = net.createServer((c) => {
  // 'connection' listener
  console.log('client connected');
  c.on('end', () => {
    console.log('client disconnected');
  });
  c.write('hello\r\n');
  c.pipe(c);
});
server.listen(8124, () => {
  console.log('server bound');
});

这里net.createServer之后调用server.listen,这面我们就重点讲解这两个函数。

net.createServer

net.createServer([options][, connectionListener])用来创建一个新的TCP或IPC服务,其入口地址在./lib/net.js,代码如下:

function createServer(options, connectionListener) {
  return new Server(options, connectionListener);
}

我们下面来关注Server的构造函数:

function Server(options, connectionListener) {
  if (!(this instanceof Server))
    return new Server(options, connectionListener);

  EventEmitter.call(this);

  if (typeof options === 'function') {
    connectionListener = options;
    options = {};
    this.on('connection', connectionListener);
  } else if (options == null || typeof options === 'object') {
    options = options || {};

    if (typeof connectionListener === 'function') {
      this.on('connection', connectionListener);
    }
  } else {
    throw new ERR_INVALID_ARG_TYPE('options', 'Object', options);
  }

  this._connections = 0;

  Object.defineProperty(this, 'connections', {
    get: internalUtil.deprecate(() => {

      if (this._usingWorkers) {
        return null;
      }
      return this._connections;
    }, 'Server.connections property is deprecated. ' +
       'Use Server.getConnections method instead.', 'DEP0020'),
    set: internalUtil.deprecate((val) => (this._connections = val),
                                'Server.connections property is deprecated.',
                                'DEP0020'),
    configurable: true, enumerable: false
  });

  this[async_id_symbol] = -1;
  this._handle = null;
  this._usingWorkers = false;
  this._workers = [];
  this._unref = false;

  this.allowHalfOpen = options.allowHalfOpen || false;
  this.pauseOnConnect = !!options.pauseOnConnect;
}
util.inherits(Server, EventEmitter);

这里我们看到其实是监听了connection事件。

server.listen

server.listen方法代码如下:

Server.prototype.listen = function(...args) {
  var normalized = normalizeArgs(args);
  var options = normalized[0];
  var cb = normalized[1];

  if (this._handle) {
    throw new ERR_SERVER_ALREADY_LISTEN();
  }

  var hasCallback = (cb !== null);
  if (hasCallback) {
    this.once('listening', cb);
  }
  var backlogFromArgs =
    // (handle, backlog) or (path, backlog) or (port, backlog)
    toNumber(args.length > 1 && args[1]) ||
    toNumber(args.length > 2 && args[2]);  // (port, host, backlog)

  options = options._handle || options.handle || options;
  // (handle[, backlog][, cb]) where handle is an object with a handle
  if (options instanceof TCP) {
    this._handle = options;
    this[async_id_symbol] = this._handle.getAsyncId();
    listenInCluster(this, null, -1, -1, backlogFromArgs);
    return this;
  }
  // (handle[, backlog][, cb]) where handle is an object with a fd
  if (typeof options.fd === 'number' && options.fd >= 0) {
    listenInCluster(this, null, null, null, backlogFromArgs, options.fd);
    return this;
  }

  // ([port][, host][, backlog][, cb]) where port is omitted,
  // that is, listen(), listen(null), listen(cb), or listen(null, cb)
  // or (options[, cb]) where options.port is explicitly set as undefined or
  // null, bind to an arbitrary unused port
  if (args.length === 0 || typeof args[0] === 'function' ||
      (typeof options.port === 'undefined' && 'port' in options) ||
      options.port === null) {
    options.port = 0;
  }
  // ([port][, host][, backlog][, cb]) where port is specified
  // or (options[, cb]) where options.port is specified
  // or if options.port is normalized as 0 before
  var backlog;
  if (typeof options.port === 'number' || typeof options.port === 'string') {
    if (!isLegalPort(options.port)) {
      throw new ERR_SOCKET_BAD_PORT(options.port);
    }
    backlog = options.backlog || backlogFromArgs;
    // start TCP server listening on host:port
    if (options.host) {
      lookupAndListen(this, options.port | 0, options.host, backlog,
                      options.exclusive);
    } else { // Undefined host, listens on unspecified address
      // Default addressType 4 will be used to search for master server
      listenInCluster(this, null, options.port | 0, 4,
                      backlog, undefined, options.exclusive);
    }
    return this;
  }

  // (path[, backlog][, cb]) or (options[, cb])
  // where path or options.path is a UNIX domain socket or Windows pipe
  if (options.path && isPipeName(options.path)) {
    var pipeName = this._pipeName = options.path;
    backlog = options.backlog || backlogFromArgs;
    listenInCluster(this, pipeName, -1, -1,
                    backlog, undefined, options.exclusive);
    return this;
  }

  throw new ERR_INVALID_OPT_VALUE('options', util.inspect(options));
};

这里使用listenInCluster方法用来监听connection并触发connection事件,我们省略掉几个调用链,发现最终其实调用的是setupListenHandle方法,其代码如下:

function setupListenHandle(address, port, addressType, backlog, fd) {
  debug('setupListenHandle', address, port, addressType, backlog, fd);

  // If there is not yet a handle, we need to create one and bind.
  // In the case of a server sent via IPC, we don't need to do this.
  if (this._handle) {
    debug('setupListenHandle: have a handle already');
  } else {
    debug('setupListenHandle: create a handle');

    var rval = null;

    // Try to bind to the unspecified IPv6 address, see if IPv6 is available
    if (!address && typeof fd !== 'number') {
      rval = createServerHandle('::', port, 6, fd);

      if (typeof rval === 'number') {
        rval = null;
        address = '0.0.0.0';
        addressType = 4;
      } else {
        address = '::';
        addressType = 6;
      }
    }

    if (rval === null)
      rval = createServerHandle(address, port, addressType, fd);

    if (typeof rval === 'number') {
      var error = exceptionWithHostPort(rval, 'listen', address, port);
      process.nextTick(emitErrorNT, this, error);
      return;
    }
    this._handle = rval;
  }

  this[async_id_symbol] = getNewAsyncId(this._handle);
  this._handle.onconnection = onconnection;
  this._handle.owner = this;
  
   // Use a backlog of 512 entries. We pass 511 to the listen() call because
  // the kernel does: backlogsize = roundup_pow_of_two(backlogsize + 1);
  // which will thus give us a backlog of 512 entries.
  var err = this._handle.listen(backlog || 511);
}

这里我们截取了关键代码,其中主要做了两件事(以TCP为例):

1.createServerHandle,创建了一个server handle(TCP_wrap对象)
2.给server handle的onconnection属性赋值,指为onconnection方法
3.调用上述创建的server handle的listen方法进行监听

createServerHandle方法中实例化了TCP对象,TCP定义在./src/tcp_wrap.cc中,我们直接进去看下:

TCPWrap::TCPWrap(Environment* env, Local<Object> object, ProviderType provider)
    : ConnectionWrap(env, object, provider) {
  int r = uv_tcp_init(env->event_loop(), &handle_);
  CHECK_EQ(r, 0);  // How do we proxy this error up to javascript?
                   // Suggestion: uv_tcp_init() returns void.
}

这里调用了uv_tcp_init方法这里不仔细介绍了,其实就是创建了socketbind

server handlelisten方法,也就是上述创建的TCP对象的listen方法,其代码如下:

void TCPWrap::Listen(const FunctionCallbackInfo<Value>& args) {
  TCPWrap* wrap;
  ASSIGN_OR_RETURN_UNWRAP(&wrap,
                          args.Holder(),
                          args.GetReturnValue().Set(UV_EBADF));
  int backlog = args[0]->Int32Value();
  int err = uv_listen(reinterpret_cast<uv_stream_t*>(&wrap->handle_),
                      backlog,
                      OnConnection);
  args.GetReturnValue().Set(err);
}

最终调用了uv_listen进行监听,uv_listen其实见识调用了uv__io_start,是的i/o在event loop的poll阶段进行处理。

当接收到连接后,调用TCP对象的onconnection属性对应的方法,也就是net.js中的onconnection方法,代码如下:

function onconnection(err, clientHandle) {
  var handle = this;
  var self = handle.owner;

  debug('onconnection');

  if (err) {
    self.emit('error', errnoException(err, 'accept'));
    return;
  }

  if (self.maxConnections && self._connections >= self.maxConnections) {
    clientHandle.close();
    return;
  }

  var socket = new Socket({
    handle: clientHandle,
    allowHalfOpen: self.allowHalfOpen,
    pauseOnCreate: self.pauseOnConnect,
    readable: true,
    writable: true
  });

  self._connections++;
  socket.server = self;
  socket._server = self;

  DTRACE_NET_SERVER_CONNECTION(socket);
  COUNTER_NET_SERVER_CONNECTION(socket);
  self.emit('connection', socket);
}

上述代码其实就是触发了connection事件。

net.createConnection

net.createConnection用于创建 net.Socket 的工厂函数,立即使用 socket.connect() 初始化链接,然后返回启动连接的 net.Socket

我们直奔./lib/net.jsnet.createConnection的代码,其实是connect方法:

function connect(...args) {
  var normalized = normalizeArgs(args);
  var options = normalized[0];
  debug('createConnection', normalized);
  var socket = new Socket(options);

  if (options.timeout) {
    socket.setTimeout(options.timeout);
  }

  return Socket.prototype.connect.call(socket, normalized);
}

这里实例化了一个Socket,并调用了原型connect方法。

我们接下来看一下Socket的构造函数和connect方法:

function Socket(options) {
  if (!(this instanceof Socket)) return new Socket(options);

  this.connecting = false;
  // Problem with this is that users can supply their own handle, that may not
  // have _handle.getAsyncId(). In this case an[async_id_symbol] should
  // probably be supplied by async_hooks.
  this[async_id_symbol] = -1;
  this._hadError = false;
  this._handle = null;
  this._parent = null;
  this._host = null;
  this[kLastWriteQueueSize] = 0;
  this[kTimeout] = null;

  if (typeof options === 'number')
    options = { fd: options }; // Legacy interface.
  else
    options = util._extend({}, options);

  options.readable = options.readable || false;
  options.writable = options.writable || false;
  const allowHalfOpen = options.allowHalfOpen;

  // Prevent the "no-half-open enforcer" from being inherited from `Duplex`.
  options.allowHalfOpen = true;
  // For backwards compat do not emit close on destroy.
  options.emitClose = false;
  stream.Duplex.call(this, options);

  // Default to *not* allowing half open sockets.
  this.allowHalfOpen = Boolean(allowHalfOpen);

  if (options.handle) {
    this._handle = options.handle; // private
    this[async_id_symbol] = getNewAsyncId(this._handle);
  } else if (options.fd !== undefined) {
    const fd = options.fd;
    this._handle = createHandle(fd, false);
    this._handle.open(fd);
    this[async_id_symbol] = this._handle.getAsyncId();
    // options.fd can be string (since it is user-defined),
    // so changing this to === would be semver-major
    // See: https://github.com/nodejs/node/pull/11513
    // eslint-disable-next-line eqeqeq
    if ((fd == 1 || fd == 2) &&
        (this._handle instanceof Pipe) &&
        process.platform === 'win32') {
      // Make stdout and stderr blocking on Windows
      var err = this._handle.setBlocking(true);
      if (err)
        throw errnoException(err, 'setBlocking');

      this._writev = null;
      this._write = makeSyncWrite(fd);
      // makeSyncWrite adjusts this value like the original handle would, so
      // we need to let it do that by turning it into a writable, own property.
      Object.defineProperty(this._handle, 'bytesWritten', {
        value: 0, writable: true
      });
    }
  }

  // shut down the socket when we're finished with it.
  this.on('end', onReadableStreamEnd);

  initSocketHandle(this);

  this._pendingData = null;
  this._pendingEncoding = '';

  // handle strings directly
  this._writableState.decodeStrings = false;

  // if we have a handle, then start the flow of data into the
  // buffer.  if not, then this will happen when we connect
  if (this._handle && options.readable !== false) {
    if (options.pauseOnCreate) {
      // stop the handle from reading and pause the stream
      this._handle.reading = false;
      this._handle.readStop();
      this.readableFlowing = false;
    } else if (!options.manualStart) {
      this.read(0);
    }
  }

  // Reserve properties
  this.server = null;
  this._server = null;

  // Used after `.destroy()`
  this[kBytesRead] = 0;
  this[kBytesWritten] = 0;
}
util.inherits(Socket, stream.Duplex);

这里我们看到Socket寄生组合继承了stream.Duplex,同时给_handle属性赋值为TCP实例(createHandle上述介绍过,创建并返回TCP实例),最后initSocketHandle初始化了TCP实例。

看完了Socket的构造函数,我们再来看connect方法,connect方法最终调用的是internalConnect方法,代码如下:

function internalConnect(
  self, address, port, addressType, localAddress, localPort) {
  // TODO return promise from Socket.prototype.connect which
  // wraps _connectReq.
	
  // ...

  if (addressType === 6 || addressType === 4) {
    const req = new TCPConnectWrap();
    req.oncomplete = afterConnect;
    req.address = address;
    req.port = port;
    req.localAddress = localAddress;
    req.localPort = localPort;

    if (addressType === 4)
      err = self._handle.connect(req, address, port);
    else
      err = self._handle.connect6(req, address, port);
  } else {
    const req = new PipeConnectWrap();
    req.address = address;
    req.oncomplete = afterConnect;

    err = self._handle.connect(req, address, afterConnect);
  }

  // ...
}

这里我们看到其调用了self._handle.connect(),也就是TCP对象的connect方法(TCPWrap::Connect), 我们到tcp_wrap.cc中看一下:

void TCPWrap::Connect(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);

  // ...

  if (err == 0) {
    AsyncHooks::DefaultTriggerAsyncIdScope trigger_scope(wrap);
    ConnectWrap* req_wrap =
        new ConnectWrap(env, req_wrap_obj, AsyncWrap::PROVIDER_TCPCONNECTWRAP);
    err = uv_tcp_connect(req_wrap->req(),
                         &wrap->handle_,
                         reinterpret_cast<const sockaddr*>(&addr),
                         AfterConnect);
    req_wrap->Dispatched();
    if (err)
      delete req_wrap;
  }

  args.GetReturnValue().Set(err);
}

这里主要做了两件事:

1.实例化ConnectWrap类
2.uv_tcp_connect连接,AfterConnect作为监听的回调

这里简单看下uv_tcp_connectuv_tcp_connect是libuv中的方法,其最终调用的是uv__tcp_connect方法,代码如下:

int uv__tcp_connect(uv_connect_t* req,
                    uv_tcp_t* handle,
                    const struct sockaddr* addr,
                    unsigned int addrlen,
                    uv_connect_cb cb) {
  int err;
  int r;

  assert(handle->type == UV_TCP);

  if (handle->connect_req != NULL)
    return UV_EALREADY;  /* FIXME(bnoordhuis) UV_EINVAL or maybe UV_EBUSY. */

  err = maybe_new_socket(handle,
                         addr->sa_family,
                         UV_STREAM_READABLE | UV_STREAM_WRITABLE);
  if (err)
    return err;

  handle->delayed_error = 0;

  do {
    errno = 0;
    r = connect(uv__stream_fd(handle), addr, addrlen);
  } while (r == -1 && errno == EINTR);

  /* We not only check the return value, but also check the errno != 0.
   * Because in rare cases connect() will return -1 but the errno
   * is 0 (for example, on Android 4.3, OnePlus phone A0001_12_150227)
   * and actually the tcp three-way handshake is completed.
   */
  if (r == -1 && errno != 0) {
    if (errno == EINPROGRESS)
      ; /* not an error */
    else if (errno == ECONNREFUSED)
    /* If we get a ECONNREFUSED wait until the next tick to report the
     * error. Solaris wants to report immediately--other unixes want to
     * wait.
     */
      handle->delayed_error = UV__ERR(errno);
    else
      return UV__ERR(errno);
  }

  uv__req_init(handle->loop, req, UV_CONNECT);
  req->cb = cb;
  req->handle = (uv_stream_t*) handle;
  QUEUE_INIT(&req->queue);
  handle->connect_req = req;

  uv__io_start(handle->loop, &handle->io_watcher, POLLOUT);

  if (handle->delayed_error)
    uv__io_feed(handle->loop, &handle->io_watcher);

  return 0;
}

这里主要做了两件事:

1.connect方法建立连接
2.uv__io_start,&handle->io_watcher信息加入loop->watcher_queue和loop->watchers[w->fd]中,poll阶段监听

AfterConnect中做了什么呢?也就是连接建立之后做了什么?我们一起看下AfterConnect的代码:

template <typename WrapType, typename UVType>
void ConnectionWrap<WrapType, UVType>::AfterConnect(uv_connect_t* req,
                                                    int status) {
  // ...

  req_wrap->MakeCallback(env->oncomplete_string(), arraysize(argv), argv);

  delete req_wrap;
}

这里主要执行了req_wraponcomplete属性对用的函数,最终落在了net.js中的afterConnect方法,代码如下:

function afterConnect(status, handle, req, readable, writable) {
  var self = handle.owner;

  // callback may come after call to destroy
  if (self.destroyed) {
    return;
  }

  // Update handle if it was wrapped
  // TODO(indutny): assert that the handle is actually an ancestor of old one
  handle = self._handle;

  debug('afterConnect');

  assert(self.connecting);
  self.connecting = false;
  self._sockname = null;

  if (status === 0) {
    self.readable = readable;
    self.writable = writable;
    self._unrefTimer();

    self.emit('connect');
    self.emit('ready');

    // start the first read, or get an immediate EOF.
    // this doesn't actually consume any bytes, because len=0.
    if (readable && !self.isPaused())
      self.read(0);

  } else {
    self.connecting = false;
    var details;
    if (req.localAddress && req.localPort) {
      details = req.localAddress + ':' + req.localPort;
    }
    var ex = exceptionWithHostPort(status,
                                   'connect',
                                   req.address,
                                   req.port,
                                   details);
    if (details) {
      ex.localAddress = req.localAddress;
      ex.localPort = req.localPort;
    }
    self.destroy(ex);
  }
}

其中emitconnectready事件。

'data' 事件

当接收到数据的时候会触发'data'事件。

当Stream中有数据时,调用链如下:

StreamBase::ReadStartJS -> LibuvStreamWrap::ReadStart -> LibuvStreamWrap::OnUvRead -> StreamResource::EmitRead -> EmitToJSStreamListener::OnStreamRead -> CallJSOnreadMethod -> onRead(net.js)

其中StreamBase::ReadStartJSTCPWrap::Initialize方法中通过LibuvStreamWrap::AddMethods(env, t, StreamBase::kFlagHasWritev);将其放在了JS对象的readStart属性上。

我们下面直接来看net.js中的onRead方法,其代码如下:

// This function is called whenever the handle gets a
// buffer, or when there's an error reading.
function onread(nread, buffer) {
  var handle = this;
  var self = handle.owner;
  assert(handle === self._handle, 'handle != self._handle');

  self._unrefTimer();

  debug('onread', nread);

  if (nread > 0) {
    debug('got data');

    // read success.
    // In theory (and in practice) calling readStop right now
    // will prevent this from being called again until _read() gets
    // called again.

    // Optimization: emit the original buffer with end points
    var ret = self.push(buffer);

    if (handle.reading && !ret) {
      handle.reading = false;
      debug('readStop');
      var err = handle.readStop();
      if (err)
        self.destroy(errnoException(err, 'read'));
    }
    return;
  }

  // if we didn't get any bytes, that doesn't necessarily mean EOF.
  // wait for the next one.
  if (nread === 0) {
    debug('not any data, keep waiting');
    return;
  }

  // Error, possibly EOF.
  if (nread !== UV_EOF) {
    return self.destroy(errnoException(nread, 'read'));
  }

  debug('EOF');

  // push a null to signal the end of data.
  // Do it before `maybeDestroy` for correct order of events:
  // `end` -> `close`
  self.push(null);
  self.read(0);
}

这里的self.push(buffer),实际调用的是Readable.prototype.push,这是因为net.js中的TCP类继承自stream.Duplex,而stream.Duplex又继承自ReadableReadable.prototype.push代码如下:

// Manually shove something into the read() buffer.
// This returns true if the highWaterMark has not been hit yet,
// similar to how Writable.write() returns true if you should
// write() some more.
Readable.prototype.push = function(chunk, encoding) {
  var state = this._readableState;
  var skipChunkCheck;

  if (!state.objectMode) {
    if (typeof chunk === 'string') {
      encoding = encoding || state.defaultEncoding;
      if (encoding !== state.encoding) {
        chunk = Buffer.from(chunk, encoding);
        encoding = '';
      }
      skipChunkCheck = true;
    }
  } else {
    skipChunkCheck = true;
  }

  return readableAddChunk(this, chunk, encoding, false, skipChunkCheck);
};

这里主要调用了readableAddChunk方法,代码如下:

function readableAddChunk(stream, chunk, encoding, addToFront, skipChunkCheck) {
  debug('readableAddChunk', chunk);
  var state = stream._readableState;
  if (chunk === null) {
    state.reading = false;
    onEofChunk(stream, state);
  } else {
    var er;
    if (!skipChunkCheck)
      er = chunkInvalid(state, chunk);
    if (er) {
      stream.emit('error', er);
    } else if (state.objectMode || chunk && chunk.length > 0) {
      if (typeof chunk !== 'string' &&
          !state.objectMode &&
          Object.getPrototypeOf(chunk) !== Buffer.prototype) {
        chunk = Stream._uint8ArrayToBuffer(chunk);
      }

      if (addToFront) {
        if (state.endEmitted)
          stream.emit('error', new ERR_STREAM_UNSHIFT_AFTER_END_EVENT());
        else
          addChunk(stream, state, chunk, true);
      } else if (state.ended) {
        stream.emit('error', new ERR_STREAM_PUSH_AFTER_EOF());
      } else if (state.destroyed) {
        return false;
      } else {
        state.reading = false;
        if (state.decoder && !encoding) {
          chunk = state.decoder.write(chunk);
          if (state.objectMode || chunk.length !== 0)
            addChunk(stream, state, chunk, false);
          else
            maybeReadMore(stream, state);
        } else {
          addChunk(stream, state, chunk, false);
        }
      }
    } else if (!addToFront) {
      state.reading = false;
      maybeReadMore(stream, state);
    }
  }

  return needMoreData(state);
}

function addChunk(stream, state, chunk, addToFront) {
  if (state.flowing && state.length === 0 && !state.sync) {
    state.awaitDrain = 0;
    stream.emit('data', chunk);
  } else {
    // update the buffer info.
    state.length += state.objectMode ? 1 : chunk.length;
    if (addToFront)
      state.buffer.unshift(chunk);
    else
      state.buffer.push(chunk);

    if (state.needReadable)
      emitReadable(stream);
  }
  maybeReadMore(stream, state);
}

function addChunk(stream, state, chunk, addToFront) {
  if (state.flowing && state.length === 0 && !state.sync) {
    state.awaitDrain = 0;
    stream.emit('data', chunk);
  } else {
    // update the buffer info.
    state.length += state.objectMode ? 1 : chunk.length;
    if (addToFront)
      state.buffer.unshift(chunk);
    else
      state.buffer.push(chunk);

    if (state.needReadable)
      emitReadable(stream);
  }
  maybeReadMore(stream, state);
}

这里readableAddChunk调用了addChunk方法,addChunk中调用stream.emit('data', chunk);触发了'data'事件。

总结

本文主要带着大家简要过了一遍net相关的实现,大家也许会发现,有了前面章节的铺垫,再看这些具体模块实现时就会轻松很多。

libuv源码-Event Loop

本文将主要介绍libuv的事件循环,包括了事件循环的流程,而我们也知道libuv是使用poll机制来实现网络I/O,通过线程池来实现文件I/O,当然线程间也是通过poll机制来实现通信的,后面就将介绍线程池与事件循环是如何结合的。

event loop流程

事件循环的流程大致如下图所示:

代码如下所示:

int uv_run(uv_loop_t* loop, uv_run_mode mode) {
  int timeout;
  int r;
  int ran_pending;

  // 有活跃的handle或req
  r = uv__loop_alive(loop);
  if (!r)
    uv__update_time(loop);

  while (r != 0 && loop->stop_flag == 0) {
    uv__update_time(loop);
    uv__run_timers(loop);
    // run pending queue
    ran_pending = uv__run_pending(loop);
    // UV_LOOP_WATCHER_DEFINE,执行队列
    uv__run_idle(loop);
    uv__run_prepare(loop);

    timeout = 0;
    if ((mode == UV_RUN_ONCE && !ran_pending) || mode == UV_RUN_DEFAULT)
      // 检查下还有没有active handle,返回下次timer发生剩余时间
      timeout = uv_backend_timeout(loop);

    uv__io_poll(loop, timeout);
    uv__run_check(loop);
    uv__run_closing_handles(loop);

    if (mode == UV_RUN_ONCE) {
      /* UV_RUN_ONCE implies forward progress: at least one callback must have
       * been invoked when it returns. uv__io_poll() can return without doing
       * I/O (meaning: no callbacks) when its timeout expires - which means we
       * have pending timers that satisfy the forward progress constraint.
       *
       * UV_RUN_NOWAIT makes no guarantees about progress so it's omitted from
       * the check.
       */
      uv__update_time(loop);
      uv__run_timers(loop);
    }

    r = uv__loop_alive(loop);
    if (mode == UV_RUN_ONCE || mode == UV_RUN_NOWAIT)
      break;
  }

  /* The if statement lets gcc compile it to a conditional store. Avoids
   * dirtying a cache line.
   */
  if (loop->stop_flag != 0)
    loop->stop_flag = 0;

  return r;
}

时间循环可以分为以下几个步骤:

1.缓存当前时间
2.执行定时器队列(最小堆)中的callback
3.执行上一轮循环pending的I/O callback
4.执行idle队列中的callback
5.执行prepare队列中的callback
6.计算离下一个timer到来的时间间隔 poll timeout
7.阻塞处理poll I/O, 超时时间为上一步计算的timeout
8.执行check队列中的callback
9.执行close队列中的callback

时间循环结束的条件有如下几种:

1.loop不是alive,也就是说没有活跃的handle或req
2.mode模式为UV_RUN_ONCE或UV_RUN_NOWAIT

下面挑选重要的几点进行讲解:

判断loop是不是alive

决定loop是否是alive取决于是否有活跃的handle或者req,或者被直接stop掉,代码如下:

static int uv__loop_alive(const uv_loop_t* loop) {
  return uv__has_active_handles(loop) ||
         uv__has_active_reqs(loop) ||
         loop->closing_handles != NULL;
}

uv__run_timers

uv__run_timers代码如下:

void uv__run_timers(uv_loop_t* loop) {
  struct heap_node* heap_node;
  uv_timer_t* handle;

  for (;;) {
    // 从timer堆中找出节点
    heap_node = heap_min((struct heap*) &loop->timer_heap);
    if (heap_node == NULL)
      break;

    // 通过heap_node找到结构体起始为止,从而找到handle
    handle = container_of(heap_node, uv_timer_t, heap_node);
    // 还没到时间
    if (handle->timeout > loop->time)
      break;

    // uv__active_handle_rm
    uv_timer_stop(handle);
    uv_timer_again(handle);
    handle->timer_cb(handle);
  }
}

我们注意到,存储timer节点的数据结构是一个以handle->timeout为基准的最小堆,函数循环过程中主要做了如下几件事:

1.从最小堆中取出当前timeout最小的节点,也就是说最先执行的阶段
2.如果最小的节点还没到时间去执行,break退出
3.如果到了该执行的时间,调用heap_remove从堆中删除节点,调用uv__active_handle_rm将loop->active_handles减1

uv__run_pending

uv__run_pending主要是将loop->pending_queue中的callback取出执行,代码如下:

static int uv__run_pending(uv_loop_t* loop) {
  QUEUE* q;
  QUEUE pq;
  uv__io_t* w;

  if (QUEUE_EMPTY(&loop->pending_queue))
    return 0;

  QUEUE_MOVE(&loop->pending_queue, &pq);

  while (!QUEUE_EMPTY(&pq)) {
    q = QUEUE_HEAD(&pq);
    QUEUE_REMOVE(q);
    QUEUE_INIT(q);
    w = QUEUE_DATA(q, uv__io_t, pending_queue);
    w->cb(loop, w, POLLOUT);
  }

  return 1;
}

后面的uv__run_idle和uv__run_prepare与之类似。

poll I/O

poll I/O是事件循环的重点,它基于IO多路复用的机制,所有网络操作都使用 non-blocking 套接字,并使用各个平台上性能最好的 poll 机制例如 linux 上的 epoll,OSX 的 kqueue 等等;而所有文件I/O基于线程池实现,但线程间通信同样基于相应的poll机制。

下面的uv__io_poll是基于linux伤的epoll来实现,其他平台的实现也类似,具体代码如下:

void uv__io_poll(uv_loop_t* loop, int timeout) {
  /* A bug in kernels < 2.6.37 makes timeouts larger than ~30 minutes
   * effectively infinite on 32 bits architectures.  To avoid blocking
   * indefinitely, we cap the timeout and poll again if necessary.
   *
   * Note that "30 minutes" is a simplification because it depends on
   * the value of CONFIG_HZ.  The magic constant assumes CONFIG_HZ=1200,
   * that being the largest value I have seen in the wild (and only once.)
   */
  static const int max_safe_timeout = 1789569;
  static int no_epoll_pwait;
  static int no_epoll_wait;
  struct uv__epoll_event events[1024];
  struct uv__epoll_event* pe;
  struct uv__epoll_event e;
  int real_timeout;
  QUEUE* q;
  uv__io_t* w;
  sigset_t sigset;
  uint64_t sigmask;
  uint64_t base;
  int have_signals;
  int nevents;
  int count;
  int nfds;
  int fd;
  int op;
  int i;

  // loop->watchers[w->fd] = w in uv__io_start func
  if (loop->nfds == 0) {
    assert(QUEUE_EMPTY(&loop->watcher_queue));
    return;
  }

  // 取出观察者队列中的fd, 调用uv__epoll_ctl监听
  while (!QUEUE_EMPTY(&loop->watcher_queue)) {
    q = QUEUE_HEAD(&loop->watcher_queue);
    QUEUE_REMOVE(q);
    QUEUE_INIT(q);
	
	 // QUEUE_DATA类似container
    w = QUEUE_DATA(q, uv__io_t, watcher_queue);
    assert(w->pevents != 0);
    assert(w->fd >= 0);
    assert(w->fd < (int) loop->nwatchers);

    e.events = w->pevents;
    e.data = w->fd;

    if (w->events == 0)
      op = UV__EPOLL_CTL_ADD;
    else
      op = UV__EPOLL_CTL_MOD;

    /* XXX Future optimization: do EPOLL_CTL_MOD lazily if we stop watching
     * events, skip the syscall and squelch the events after epoll_wait().
     */
    // fd = uv__epoll_create1(UV__EPOLL_CLOEXEC); loop->backend_fd = fd;
    if (uv__epoll_ctl(loop->backend_fd, op, w->fd, &e)) {
      if (errno != EEXIST)
        abort();

      assert(op == UV__EPOLL_CTL_ADD);

      /* We've reactivated a file descriptor that's been watched before. */
      if (uv__epoll_ctl(loop->backend_fd, UV__EPOLL_CTL_MOD, w->fd, &e))
        abort();
    }

    w->events = w->pevents;
  }

  sigmask = 0;
  if (loop->flags & UV_LOOP_BLOCK_SIGPROF) {
    sigemptyset(&sigset);
    sigaddset(&sigset, SIGPROF);
    sigmask |= 1 << (SIGPROF - 1);
  }

  assert(timeout >= -1);
  base = loop->time;
  count = 48; /* Benchmarks suggest this gives the best throughput. */
  real_timeout = timeout;

  for (;;) {
    /* See the comment for max_safe_timeout for an explanation of why
     * this is necessary.  Executive summary: kernel bug workaround.
     */
    if (sizeof(int32_t) == sizeof(long) && timeout >= max_safe_timeout)
      timeout = max_safe_timeout;

    if (sigmask != 0 && no_epoll_pwait != 0)
      if (pthread_sigmask(SIG_BLOCK, &sigset, NULL))
        abort();

    if (no_epoll_wait != 0 || (sigmask != 0 && no_epoll_pwait == 0)) {
      // 返回需要处理的事件数目
      nfds = uv__epoll_pwait(loop->backend_fd,
                             events,
                             ARRAY_SIZE(events),
                             timeout,
                             sigmask);
      if (nfds == -1 && errno == ENOSYS)
        no_epoll_pwait = 1;
    } else {
      nfds = uv__epoll_wait(loop->backend_fd,
                            events,
                            ARRAY_SIZE(events),
                            timeout);
      if (nfds == -1 && errno == ENOSYS)
        no_epoll_wait = 1;
    }

    if (sigmask != 0 && no_epoll_pwait != 0)
      if (pthread_sigmask(SIG_UNBLOCK, &sigset, NULL))
        abort();

    /* Update loop->time unconditionally. It's tempting to skip the update when
     * timeout == 0 (i.e. non-blocking poll) but there is no guarantee that the
     * operating system didn't reschedule our process while in the syscall.
     */
    SAVE_ERRNO(uv__update_time(loop));

    if (nfds == 0) {
      assert(timeout != -1);

      if (timeout == 0)
        return;

      /* We may have been inside the system call for longer than |timeout|
       * milliseconds so we need to update the timestamp to avoid drift.
       */
      // 没有需要处理的事件
      goto update_timeout;
    }

    if (nfds == -1) {
      if (errno == ENOSYS) {
        /* epoll_wait() or epoll_pwait() failed, try the other system call. */
        assert(no_epoll_wait == 0 || no_epoll_pwait == 0);
        continue;
      }

      if (errno != EINTR)
        abort();

      if (timeout == -1)
        continue;

      if (timeout == 0)
        return;

      /* Interrupted by a signal. Update timeout and poll again. */
      goto update_timeout;
    }

    have_signals = 0;
    nevents = 0;

    assert(loop->watchers != NULL);
    loop->watchers[loop->nwatchers] = (void*) events;
    loop->watchers[loop->nwatchers + 1] = (void*) (uintptr_t) nfds;
    for (i = 0; i < nfds; i++) {
      pe = events + i;
      // (*pe).data
      fd = pe->data;

      /* Skip invalidated events, see uv__platform_invalidate_fd */
      if (fd == -1)
        continue;

      assert(fd >= 0);
      assert((unsigned) fd < loop->nwatchers);

      w = loop->watchers[fd];

      if (w == NULL) {
        /* File descriptor that we've stopped watching, disarm it.
         *
         * Ignore all errors because we may be racing with another thread
         * when the file descriptor is closed.
         */
        // 从红黑树中删除fd
        uv__epoll_ctl(loop->backend_fd, UV__EPOLL_CTL_DEL, fd, pe);
        continue;
      }

      /* Give users only events they're interested in. Prevents spurious
       * callbacks when previous callback invocation in this loop has stopped
       * the current watcher. Also, filters out events that users has not
       * requested us to watch.
       */
      pe->events &= w->pevents | POLLERR | POLLHUP;

      /* Work around an epoll quirk where it sometimes reports just the
       * EPOLLERR or EPOLLHUP event.  In order to force the event loop to
       * move forward, we merge in the read/write events that the watcher
       * is interested in; uv__read() and uv__write() will then deal with
       * the error or hangup in the usual fashion.
       *
       * Note to self: happens when epoll reports EPOLLIN|EPOLLHUP, the user
       * reads the available data, calls uv_read_stop(), then sometime later
       * calls uv_read_start() again.  By then, libuv has forgotten about the
       * hangup and the kernel won't report EPOLLIN again because there's
       * nothing left to read.  If anything, libuv is to blame here.  The
       * current hack is just a quick bandaid; to properly fix it, libuv
       * needs to remember the error/hangup event.  We should get that for
       * free when we switch over to edge-triggered I/O.
       */
      if (pe->events == POLLERR || pe->events == POLLHUP)
        pe->events |= w->pevents & (POLLIN | POLLOUT | UV__POLLPRI);

      if (pe->events != 0) {
        /* Run signal watchers last.  This also affects child process watchers
         * because those are implemented in terms of signal watchers.
         */
        if (w == &loop->signal_io_watcher)
          have_signals = 1;
        else
          // uv__async_io, uv__async_start中的uv__io_init注册
          w->cb(loop, w, pe->events);

        nevents++;
      }
    }

    if (have_signals != 0)
      loop->signal_io_watcher.cb(loop, &loop->signal_io_watcher, POLLIN);

    loop->watchers[loop->nwatchers] = NULL;
    loop->watchers[loop->nwatchers + 1] = NULL;

    if (have_signals != 0)
      return;  /* Event loop should cycle now so don't poll again. */

    if (nevents != 0) {
      if (nfds == ARRAY_SIZE(events) && --count != 0) {
        /* Poll for more events but don't block this time. */
        timeout = 0;
        continue;
      }
      return;
    }

    if (timeout == 0)
      return;

    if (timeout == -1)
      continue;

update_timeout:
    assert(timeout > 0);

    real_timeout -= (loop->time - base);
    if (real_timeout <= 0)
      return;

    timeout = real_timeout;
  }
}

这里主要做了如下几件事:

1.取出loop->watcher_queue中所有对象的uv__io_t handle(w),调用调用uv__epoll_ctl来监听w.fd
2.循环阻塞调用uv__epoll_pwait,其返回当时需要处理的事件数目
3.如果当前没有要处理的事件,检查是否超时
4.如果有需要处理的事件,那么从loop->watchers根据相应的fd取出uv__io_t handle w,调用w.cb()执行其对应的回调

这里需要注意的有以下几点:

loop->backend_fd

uv__epoll_ctl(loop->backend_fd, op, w->fd, &e),了解epoll的同学都会知道这里loop->backend_fd在内核高速缓冲区,用来表示当前这个epoll在所在红黑树的起点。

其在uv__platform_loop_init中被赋值,代码如下:

fd = uv__epoll_create1(UV__EPOLL_CLOEXEC);

loop->watchers

epoll通过调用uv__epoll_pwait来获取需要处理事件的数据,参数events用来从内核得到事件的集合,这也是epoll的优势之一(共享内存的方式)。我们从events中取出相应的fd,然后根据fd从loop->watchers中取出handle并执行起callback,那么loop->watchers是如何初始化的呢?

void uv__io_start(uv_loop_t* loop, uv__io_t* w, unsigned int events) {
  assert(0 == (events & ~(POLLIN | POLLOUT | UV__POLLRDHUP | UV__POLLPRI)));
  assert(0 != events);
  assert(w->fd >= 0);
  assert(w->fd < INT_MAX);

  w->pevents |= events;
  maybe_resize(loop, w->fd + 1);

#if !defined(__sun)
  /* The event ports backend needs to rearm all file descriptors on each and
   * every tick of the event loop but the other backends allow us to
   * short-circuit here if the event mask is unchanged.
   */
  if (w->events == w->pevents)
    return;
#endif

  if (QUEUE_EMPTY(&w->watcher_queue))
    QUEUE_INSERT_TAIL(&loop->watcher_queue, &w->watcher_queue);

  if (loop->watchers[w->fd] == NULL) {
    loop->watchers[w->fd] = w;
    loop->nfds++;
  }
}

其在uv__io_start中被初始化,loop->watchers是一个数组类型,其index用来表示uv__io_t handle中的fd,这样我们根据fd可以轻松的找出其uv__io_t handle。

uv__io_start在多处被用到,包括uv__async_start中调用uv__io_start来监听线程间通信用到的fd,还有在tcp、udp模块中都有用其监听fd。

我们可以看出,IO事件都会调用 uv__io_start 函数,该函数将需要监听的事件保存到 event loop的watcher_queue队列中

超时

我们发现uv__io_poll其实是阻塞的,为了解决阻塞的问题,在调用的时候加入了timeout参数,timeout参数表示距离下一个timer需要执行(超过了timer的timeout)的时间,当没有要处理的事件时,会根据进入uv__io_poll时的事件来计算是否需要break。update_timeout的代码如下:

assert(timeout > 0);

real_timeout -= (loop->time - base);
if (real_timeout <= 0)
  return;

timeout = real_timeout;

线程池实现文件异步I/O

Libuv的文件I/O是基于线程池来实现的,大致原理是主线程提交任务到任务队列,发送信号给线程池,线程池中的worker收到信号,从任务队列中取出任务并执行,工作线程执行完任务后,将任务对应uv_async_t handle的pending状态置0,通过fd通知主线程(该 fd 同样由epoll管理),主线程监听该fd,当有epoll事件时,执行非pending的uv_async_t handle对应的回调,然后根据层层回调,最终会调用到用户注册的回调函数

说到线程池,几乎所有线程池的实现都遵循如下模型,也就是任务队列+线程池的模型,libuv的实现也是基于此。

libuv中任务队列基于一个双向链表,其中的任务的struct声明如下:

struct uv__work {
  void (*work)(struct uv__work *w);
  void (*done)(struct uv__work *w, int status);
  struct uv_loop_s* loop;
  void* wq[2];
};

我们可以看到,其中work代表线程池实际要做的工作,done代表任务执行后的callback,wq数组为两个指针,分别指向任务队列中的前后节点。

下面我们首先看一下主线程如何提交任务到任务队列:

首先在fs.c中有这样一段逻辑,其中所有的文件操作都会调用POST,代码如下:

#define POST                                                                  \
  do {                                                                        \
    if (cb != NULL) {                                                         \
      uv__work_submit(loop, &req->work_req, uv__fs_work, uv__fs_done);        \
      return 0;                                                               \
    }                                                                         \
    else {                                                                    \
      // 回调为 null 是同步调用                                                  \
      uv__fs_work(&req->work_req);                                            \
      return req->result;                                                     \
    }                                                                         \
  }                                                                           \
  while (0)

// 操作完成后的回调函数
static void uv__fs_done(struct uv__work* w, int status) {
  uv_fs_t* req;

  req = container_of(w, uv_fs_t, work_req);
  uv__req_unregister(req->loop, req);

  if (status == -ECANCELED) {
    assert(req->result == 0);
    req->result = -ECANCELED;
  }

  req->cb(req);  // 调用用户注册的回调
}

POST宏中调用了uv__work_submit将任务提交到队列,下面我们看下uv__work_submit的代码:

void uv__work_submit(uv_loop_t* loop,
                     struct uv__work* w,
                     void (*work)(struct uv__work* w),
                     void (*done)(struct uv__work* w, int status)) {
  uv_once(&once, init_once);
  w->loop = loop;
  w->work = work;
  w->done = done;
  post(&w->wq);
}

这里主要做了两件事:

1.初始化线程池,这里利用了&once,来保证只执行一次,在这里我们也可以看出,libuv中的线程池是在第一次使用时被初始化
2.post提交

uv__work_submit这块涉及的逻辑如下:

static void init_once(void) {
  unsigned int i;
  const char* val;
  uv_sem_t sem;

  // UV_THREADPOOL_SIZE决定线程池中线程的数量
  nthreads = ARRAY_SIZE(default_threads);
  val = getenv("UV_THREADPOOL_SIZE");
  if (val != NULL)
    nthreads = atoi(val);
  if (nthreads == 0)
    nthreads = 1;
  if (nthreads > MAX_THREADPOOL_SIZE)
    nthreads = MAX_THREADPOOL_SIZE;

  threads = default_threads;
  if (nthreads > ARRAY_SIZE(default_threads)) {
    threads = uv__malloc(nthreads * sizeof(threads[0]));
    if (threads == NULL) {
      nthreads = ARRAY_SIZE(default_threads);
      threads = default_threads;
    }
  }

  if (uv_cond_init(&cond))
    abort();

  if (uv_mutex_init(&mutex))
    abort();

  QUEUE_INIT(&wq);

  if (uv_sem_init(&sem, 0))
    abort();

  for (i = 0; i < nthreads; i++)
    if (uv_thread_create(threads + i, worker, &sem))
      abort();

  for (i = 0; i < nthreads; i++)
    uv_sem_wait(&sem);

  uv_sem_destroy(&sem);
}

/* To avoid deadlock with uv_cancel() it's crucial that the worker
 * never holds the global mutex and the loop-local mutex at the same time.
 */
static void worker(void* arg) {
  struct uv__work* w;
  QUEUE* q;

  uv_sem_post((uv_sem_t*) arg);
  arg = NULL;

  for (;;) {
    uv_mutex_lock(&mutex);

    while (QUEUE_EMPTY(&wq)) {
      idle_threads += 1;
      uv_cond_wait(&cond, &mutex);
      idle_threads -= 1;
    }

    q = QUEUE_HEAD(&wq);

    if (q == &exit_message)
      uv_cond_signal(&cond);
    else {
      QUEUE_REMOVE(q);
      QUEUE_INIT(q);  /* Signal uv_cancel() that the work req is
                             executing. */
    }

    uv_mutex_unlock(&mutex);

    if (q == &exit_message)
      break;

    w = QUEUE_DATA(q, struct uv__work, wq);
    w->work(w);

    uv_mutex_lock(&w->loop->wq_mutex);
    w->work = NULL;  /* Signal uv_cancel() that the work req is done
                        executing. */
    QUEUE_INSERT_TAIL(&w->loop->wq, &w->wq);
    uv_async_send(&w->loop->wq_async);
    uv_mutex_unlock(&w->loop->wq_mutex);
  }
}


static void post(QUEUE* q) {
  uv_mutex_lock(&mutex);
  QUEUE_INSERT_TAIL(&wq, q);
  if (idle_threads > 0)
    uv_cond_signal(&cond);
  uv_mutex_unlock(&mutex);
}

这里需要关注的有以下几点:

1.init_once关键代码其实就是获取线程池中线程的数量并创建对应数量的线程,每个线程中执行worker函数,
2.线程池中线程数量从UV_THREADPOOL_SIZE环境变量中获取,默认是4
3.在worker中,工作线程等待cond信号,如果有,则取任务队列中的任务来执行,执行后调用uv_async_send通知主线程,后面会详细介绍uv\_async\_send
4.post方法用来将wq插入到任务队列,并发出信号

我们再来看下工作线程执行完任务后是如何通知主线程的,也就是上述的uv_async_send方法:

int uv_async_send(uv_async_t* handle) {
  /* Do a cheap read first. */
  if (ACCESS_ONCE(int, handle->pending) != 0)
    return 0;

  if (cmpxchgi(&handle->pending, 0, 1) == 0)
    uv__async_send(&handle->loop->async_watcher);

  return 0;
}

void uv__async_send(struct uv__async* wa) {
  const void* buf;
  ssize_t len;
  int fd;
  int r;

  buf = "";
  len = 1;
  fd = wa->wfd;

#if defined(__linux__)
  if (fd == -1) {
    static const uint64_t val = 1;
    buf = &val;
    len = sizeof(val);
    fd = wa->io_watcher.fd;  /* eventfd */
  }
#endif

  do
    r = write(fd, buf, len);
  while (r == -1 && errno == EINTR);

  if (r == len)
    return;

  if (r == -1)
    if (errno == EAGAIN || errno == EWOULDBLOCK)
      return;

  abort();
}

这里主要做了如下几件事:

1.将uv_async_t handle(也就是&w->loop->wq_async)的pending状态码置0,代表执行完毕
2.调用uv__async_send方法,向handle->loop->async_watcher->io_watcher.fd写入一个空字节(主线程epoll会监听到)

当主线程监听到async_watcher->io_watcher.fd的变化后,通过层层回调,最终调用uv__work的done函数,也就是用户注册的回调。这部分我们首先从前向后看下回调的注册:

// async.c
int uv_async_init(uv_loop_t* loop, uv_async_t* handle, uv_async_cb async_cb) {
  int err;

  err = uv__async_start(loop);
  if (err)
    return err;

  uv__handle_init(loop, (uv_handle_t*)handle, UV_ASYNC);
  handle->async_cb = async_cb;
  handle->pending = 0;

  // 加入到async_handles上
  QUEUE_INSERT_TAIL(&loop->async_handles, &handle->queue);
  uv__handle_start(handle);

  return 0;
}

// async.c
// 将loop->async_io_watcher.fd加入loop->watcher_queue监听
static int uv__async_start(uv_loop_t* loop) {
  int pipefd[2];
  int err;

  if (loop->async_io_watcher.fd != -1)
    return 0;

  err = uv__async_eventfd();
  if (err >= 0) {
    pipefd[0] = err;
    pipefd[1] = -1;
  }
  else if (err == UV_ENOSYS) {
    err = uv__make_pipe(pipefd, UV__F_NONBLOCK);
#if defined(__linux__)
    /* Save a file descriptor by opening one of the pipe descriptors as
     * read/write through the procfs.  That file descriptor can then
     * function as both ends of the pipe.
     */
    if (err == 0) {
      char buf[32];
      int fd;

      snprintf(buf, sizeof(buf), "/proc/self/fd/%d", pipefd[0]);
      fd = uv__open_cloexec(buf, O_RDWR);
      if (fd >= 0) {
        uv__close(pipefd[0]);
        uv__close(pipefd[1]);
        pipefd[0] = fd;
        pipefd[1] = fd;
      }
    }
#endif
  }

  if (err < 0)
    return err;

  // 注册 async io 事件的 callback 为 uv__async_io
  // loop->async_io_watcher注册fd等
  uv__io_init(&loop->async_io_watcher, uv__async_io, pipefd[0]);
  // 将该 io_watcher 添加到 loop->watcher_queue, epoll会取出
  uv__io_start(loop, &loop->async_io_watcher, POLLIN);
  loop->async_wfd = pipefd[1];

  return 0;
}

// core.c
void uv__io_init(uv__io_t* w, uv__io_cb cb, int fd) {
  assert(cb != NULL);
  assert(fd >= -1);
  QUEUE_INIT(&w->pending_queue);
  QUEUE_INIT(&w->watcher_queue);
  w->cb = cb;
  w->fd = fd;
  w->events = 0;
  w->pevents = 0;

#if defined(UV_HAVE_KQUEUE)
  w->rcount = 0;
  w->wcount = 0;
#endif /* defined(UV_HAVE_KQUEUE) */
}

// core.c
void uv__io_start(uv_loop_t* loop, uv__io_t* w, unsigned int events) {
  assert(0 == (events & ~(POLLIN | POLLOUT | UV__POLLRDHUP | UV__POLLPRI)));
  assert(0 != events);
  assert(w->fd >= 0);
  assert(w->fd < INT_MAX);

  w->pevents |= events;
  maybe_resize(loop, w->fd + 1);

#if !defined(__sun)
  /* The event ports backend needs to rearm all file descriptors on each and
   * every tick of the event loop but the other backends allow us to
   * short-circuit here if the event mask is unchanged.
   */
  if (w->events == w->pevents)
    return;
#endif

  if (QUEUE_EMPTY(&w->watcher_queue))
    QUEUE_INSERT_TAIL(&loop->watcher_queue, &w->watcher_queue);

  if (loop->watchers[w->fd] == NULL) {
    loop->watchers[w->fd] = w;
    loop->nfds++;
  }
}

这块按照执行顺序做了如下几件事:

1.uv_loop_init中调用uv_async_init初始化loop->async_io_watcher.fd, 同时将loop->async_io_watcher加入到loop->async_handles中
2.uv__async_start调用uv__io_init和uv__io_start
3.uv__io_init注册 async io 事件的 callback 为 uv__async_io,并在loop->async_io_watcher上注册fd
4.uv__io_start将loop->async_io_watcher.fd加入loop->watcher_queue供epoll监听,同时在loop->watchers中通过fd注册loop->async_io_watcher

现在我们来梳理下当主线程接收到事件后,如何层层回调,最终执行uv__work的done即用户提交的回调函数。

在uv__io_poll方法中,通过uv__epoll_pwait监听到时间后,会执行loop->watchers取出uv__io_start中注册的uv__io_t(也就是上面注册的loop->async_io_watcher),然后执行其注册的回调(uv__async_io)。

uv__async_io代码如下:

static void uv__async_io(uv_loop_t* loop, uv__io_t* w, unsigned int events) {
  char buf[1024];
  ssize_t r;
  QUEUE queue;
  QUEUE* q;
  uv_async_t* h;

  assert(w == &loop->async_io_watcher);

  // 将在uv__async_send()中向fd中写入的数据取干净
  for (;;) {
    r = read(w->fd, buf, sizeof(buf));

    if (r == sizeof(buf))
      continue;

    if (r != -1)
      break;

    if (errno == EAGAIN || errno == EWOULDBLOCK)
      break;

    if (errno == EINTR)
      continue;

    abort();
  }

  // 执行loop->async_handles里的回调函数
  QUEUE_MOVE(&loop->async_handles, &queue);
  while (!QUEUE_EMPTY(&queue)) {
    q = QUEUE_HEAD(&queue);
    h = QUEUE_DATA(q, uv_async_t, queue);

    QUEUE_REMOVE(q);
    QUEUE_INSERT_TAIL(&loop->async_handles, q);

    // h->pending == 0
    if (cmpxchgi(&h->pending, 1, 0) == 0)
      continue;

    if (h->async_cb == NULL)
      continue;

    h->async_cb(h);
  }
}

这里主要做了两件事:

1.将在uv__async_send()中向fd中写入的数据取干净
2.执行loop->async_handles中,pending状态码为0的handle的回调函数(async_cb),其async_cb就是我们再uv_loop_init中调用uv_async_init注册的uv__work_done方法,在其中最终调用了用户注册的回调。

总结

由于Node.js异步I/O依赖libuv,libuv的核心又是event loop,本文主要介绍了event loop的流程以及线程池的实现。

Node.js源码-编译

os:macOS 10.13.4,ide:cLion,node版本:v8.2.1

前言

编译node源码主要有三个步骤

$ ./configure
$ make
$ make install

./configue主要用来生成与操作平台相关的编译配置,比如软件装到哪里、什么参数等信息,执行过后在./out目录生成如下文件:

make指令根据Makefile的配置对node源码进行编译(包括预编译、编译、链接)生成可执行文件,感兴趣的可以参考刨根问底之node-gyp

make install根据配置将其安装到系统路径下,我们一般自己看源码调试是用不上的

编译过程详解

./configue

收集命令行参数

# Options should be in alphabetical order but keep --prefix at the top,
# that's arguably the one people will be looking for most.
parser.add_option('--prefix',
    action='store',
    dest='prefix',
    default='/usr/local',
    help='select the install prefix [default: %default]')

parser.add_option('--coverage',
    action='store_true',
    dest='coverage',
    help='Build node with code coverage enabled')

parser.add_option('--debug',
    action='store_true',
    dest='debug',
    help='also build debug build')

......

(options, args) = parser.parse_args()

收集到的参数是一个map,如下所示:

当然最终版的参数信息原本也会打印出来。

其中要注意的是在调试时别忘了加上prefix和debug。如果不定义prefix的话,执行make install会安装到默认的local/user/目录下;定义debug会按照调试的配置编译,最终会编译到out/Debug目录下(下述./makefile中有描述),同时增加一些配置方便大家调试(打断点等)。

收集编译器和以下library的参数

# Print a warning when the compiler is too old.
check_compiler(output)

# determine the "flavor" (operating system) we're building for,
# leveraging gyp's GetFlavor function
flavor_params = {}
if (options.dest_os):
  flavor_params['flavor'] = options.dest_os
flavor = GetFlavor(flavor_params)

configure_node(output)
configure_library('zlib', output)
configure_library('http_parser', output)
configure_library('libuv', output)
configure_library('libcares', output)
configure_library('nghttp2', output)
# stay backwards compatible with shared cares builds
output['variables']['node_shared_cares'] = \
    output['variables'].pop('node_shared_libcares')
configure_v8(output)
configure_openssl(output)
configure_intl(output)
configure_static(output)
configure_inspector(output)
check_compiler

在这里我们简单看下python是如何检查编译器的

def try_check_compiler(cc, lang):
  try:
    proc = subprocess.Popen(shlex.split(cc) + ['-E', '-P', '-x', lang, '-'],
                            stdin=subprocess.PIPE, stdout=subprocess.PIPE)
  except OSError:
    return (False, False, '', '')

  proc.stdin.write('__clang__ __GNUC__ __GNUC_MINOR__ __GNUC_PATCHLEVEL__ '
                   '__clang_major__ __clang_minor__ __clang_patchlevel__')

  values = (proc.communicate()[0].split() + ['0'] * 7)[0:7]
  is_clang = values[0] == '1'
  gcc_version = '%s.%s.%s' % tuple(values[1:1+3])
  clang_version = '%s.%s.%s' % tuple(values[4:4+3])

  return (True, is_clang, clang_version, gcc_version)

其实就是新开了一个子进程,在其上执行CXX,然后获取版本信息。

CXX = os.environ.get('CXX', 'c++' if sys.platform == 'darwin' else 'g++')

在OSX中,GYP的Makefile底层依赖的的c++,其他操作系统都是g++。

run_gyp

最后执行了run_gyp(gyp_args)

在run_gyp中又做了什么呢?

output_dir = os.path.join(os.path.abspath(node_root), 'out')

def run_gyp(args):
  # GYP bug.
  # On msvs it will crash if it gets an absolute path.
  # On Mac/make it will crash if it doesn't get an absolute path.
  a_path = node_root if sys.platform == 'win32' else os.path.abspath(node_root)
  args.append(os.path.join(a_path, 'node.gyp'))
  common_fn = os.path.join(a_path, 'common.gypi')
  options_fn = os.path.join(a_path, 'config.gypi')
  options_fips_fn = os.path.join(a_path, 'config_fips.gypi')

  if os.path.exists(common_fn):
    args.extend(['-I', common_fn])

  if os.path.exists(options_fn):
    args.extend(['-I', options_fn])

  if os.path.exists(options_fips_fn):
    args.extend(['-I', options_fips_fn])

  args.append('--depth=' + node_root)

  # There's a bug with windows which doesn't allow this feature.
  if sys.platform != 'win32' and 'ninja' not in args:
    # Tell gyp to write the Makefiles into output_dir
    args.extend(['--generator-output', output_dir])

    # Tell make to write its output into the same dir
    args.extend(['-Goutput_dir=' + output_dir])

  args.append('-Dcomponent=static_library')
  args.append('-Dlibrary=static_library')

  # Don't compile with -B and -fuse-ld=, we don't bundle ld.gold.  Can't be
  # set in common.gypi due to how deps/v8/build/toolchain.gypi uses them.
  args.append('-Dlinux_use_bundled_binutils=0')
  args.append('-Dlinux_use_bundled_gold=0')
  args.append('-Dlinux_use_gold_flags=0')

  rc = gyp.main(args)
  if rc != 0:
    print 'Error running GYP'
    sys.exit(rc)

主要做了两件事:

1.收集参数
2.执行gyp.main(args)

比较重要的也是两点:

1.加入node.gyp配置文件
	args.append(os.path.join(a_path, 'node.gyp'))
2.确定了生成目录 generator-output
node.gyp

我们继续深入,看下node.gyp。

node.gyp是一个python的数据结构,打眼看上去似乎很多,容易看着看着就乱了,我们其实可以从target_name入手。这里就不贴代码了,大家感兴趣的可以顺着target_name一个个顺下去。比较重要的是以下几个:

1.node可执行文件
2.定义node_js2c的输出node_javascript.cc
3.node_dtrace动态跟踪框架
4.cctest测试

make

Makefile

执行make的话实际上就是按照当前目录下的makefile执行动作,我们看一下makefile。

ifeq ($(BUILDTYPE),Release)
all: out/Makefile $(NODE_EXE) ## Default target, builds node in out/Release/node.
else
all: out/Makefile $(NODE_EXE) $(NODE_G_EXE)
endif

....

$(NODE_EXE): config.gypi out/Makefile
	$(MAKE) -C out BUILDTYPE=Release V=$(V)
	if [ ! -r $@ -o ! -L $@ ]; then ln -fs out/Release/$(NODE_EXE) $@; fi

$(NODE_G_EXE): config.gypi out/Makefile
	$(MAKE) -C out BUILDTYPE=Debug V=$(V)
	if [ ! -r $@ -o ! -L $@ ]; then ln -fs out/Debug/$(NODE_EXE) $@; fi

如果是debug模式,则多执行了$(NODE_G_EXE),$(NODE_G_EXE)将BUILDTYPE设置为debug,不同之处在于BUILDTYPE=Debugout/Debug/$(NODE_EXE)。随后两个命令都对out/Debug/node做了一个软链,如果是第一次编译,会建立一个软链,链接到node_g。

out/Makefile

我们再接着看out/makefile

执行编译

TOOLSET := host
# Suffix rules, putting all outputs into $(obj).
$(obj).$(TOOLSET)/%.o: $(srcdir)/%.c FORCE_DO_CMD
	@$(call do_cmd,cc,1)
$(obj).$(TOOLSET)/%.o: $(srcdir)/%.cc FORCE_DO_CMD
	@$(call do_cmd,cxx,1)
$(obj).$(TOOLSET)/%.o: $(srcdir)/%.cpp FORCE_DO_CMD
	@$(call do_cmd,cxx,1)
$(obj).$(TOOLSET)/%.o: $(srcdir)/%.cxx FORCE_DO_CMD
	@$(call do_cmd,cxx,1)
$(obj).$(TOOLSET)/%.o: $(srcdir)/%.m FORCE_DO_CMD
	@$(call do_cmd,objc,1)
$(obj).$(TOOLSET)/%.o: $(srcdir)/%.mm FORCE_DO_CMD
	@$(call do_cmd,objcxx,1)
$(obj).$(TOOLSET)/%.o: $(srcdir)/%.S FORCE_DO_CMD
	@$(call do_cmd,cc,1)
$(obj).$(TOOLSET)/%.o: $(srcdir)/%.s FORCE_DO_CMD
	@$(call do_cmd,cc,1)

这里执行do_cmd将所有源文件进行编译。

do_cmd

接下来我们看下do_cmd做了什么?

# do_cmd: run a command via the above cmd_foo names, if necessary.
# Should always run for a given target to handle command-line changes.
# Second argument, if non-zero, makes it do asm/C/C++ dependency munging.
# Third argument, if non-zero, makes it do POSTBUILDS processing.
# Note: We intentionally do NOT call dirx for depfile, since it contains ? for
# spaces already and dirx strips the ? characters.
define do_cmd
$(if $(or $(command_changed),$(prereq_changed)),
  @$(call exact_echo,  $($(quiet)cmd_$(1)))
  @mkdir -p "$(call dirx,$@)" "$(dir $(depfile))"
  $(if $(findstring flock,$(word 2,$(cmd_$1))),
    @$(cmd_$(1))
    @echo "  $(quiet_cmd_$(1)): Finished",
    @$(cmd_$(1))
  )
  @$(call exact_echo,$(call escape_vars,cmd_$(call replace_spaces,$@) := $(cmd_$(1)))) > $(depfile)
  @$(if $(2),$(fixup_dep))
  $(if $(and $(3), $(POSTBUILDS)),
    $(call do_postbuilds)
  )
)
endef

其实就是根据参数(源文件类型),执行不同的指令,比如.cc文件就是利用CXX进行编译。

quiet_cmd_cxx = CXX($(TOOLSET)) $@
cmd_cxx = $(CXX.$(TOOLSET)) $(GYP_CXXFLAGS) $(DEPFLAGS) $(CXXFLAGS.$(TOOLSET)) -c -o $@ $<

监听所有.mk文件

ifeq ($(strip $(foreach prefix,$(NO_LOAD),\
    $(findstring $(join ^,$(prefix)),\
                 $(join ^,cctest.target.mk)))),)
  include cctest.target.mk
endif
ifeq ($(strip $(foreach prefix,$(NO_LOAD),\
    $(findstring $(join ^,$(prefix)),\
                 $(join ^,deps/cares/cares.target.mk)))),)
  include deps/cares/cares.target.mk
endif

这里监听了所有.mk文件,相当于这里监听了所有的node相关的文件,只要include的关联文件有改动,在make的时候都会造成out/Makefile的重新编译。

make install

make install用于将可执行文件安装到./configue中的prefix文件中,我们看源码调试过程中用不上。

总结

本问介绍了node源码编译的大致过程,至于调试的话,用clion ide即可,网上有很多文章都介绍过,大家试着配一下就好了。

本文可能有很多不准确的地方,欢迎大家纠正。

关于请求被挂起的问题排查

起因

起因是公司的某MIS系统中某个HTTP请求耗时长,有时达到几十秒,导致了xhr出发了onTimeout事件;chrome的network面板中,发现请求一直在stalled阶段。

可能的原因

Stalled/Blocking的解释如下:

Time the request spent waiting before it could be sent. This time is inclusive of any time spent in proxy negotiation. Additionally, this time will include when the browser is waiting for an already established connection to become available for re-use, obeying Chrome's maximum six TCP connection per origin rule.

通过上面的描述,我们知道Stalled/Blocking其实是建立连接前的时间,其中包括了代理协商、等待socket复用的时间。

我们推测可能又如下原因:

  • 同源下请求过多,导致一直在等待socket复用
  • 代理协商过程中,存在TCP/IP包一直重传,这个重传可能是由于之前同源下的连接被路由器或ISP或我们的服务断开,但没有通知浏览器。

排查

排除同源请求过多

我们去掉了同源下的大部分请求,仍在存在超时问题,说明同源请求多不是关键因素。

Wireshark抓包

随即想到用Wireshark抓下包,看下TCP的包的传输情况,但此case是偶现,并且可操作该后台(有机会复现)的同学散布在全国各地,并且是非研发,远程帮助他们装wireshark成本较高,作为备选项。

日志探究

接着想到了chrome日志(比network面板内容详细太多),随即让操作同学打开chrome日志,出现问题后,将日志发给我。

得到的错误日志如下:
dbe005c582579b65a9be2a33ee937c06

我们可以看到实际的状态处于IO_PENDING,load_state处于WAITING_FOR_RESPONSE

看到这里我们大概明白了,应该是在代理协商阶段,服务一直没有响应,导致了超时。超时时间设置的40s,也与日志吻合。

Chromium 源码

好奇这两个状态,于是看了下chromium源码:

从注释中可以看出,WAITING_FOR_RESPONSE代表着请求发出但是没有接到响应头的状态。

IO_PENDING比较宽泛,请求发出到未完成都属于IO_PENDING

chrome日志怎么看

其中+代表动作开始,-代表结束,--->代表动作的状态。

我们可以看出一个完整请求首先去域名解析(host resolve),接着去做socket连接

Node.js源码-一个node程序是如何运行的

本文从node入口出发,一步一步的阅读源码,直到运行结束。

node入口

node的入口是node/src/node_main.cc文件,main函数代码如下:

int main(int argc, char *argv[]) {
#if defined(__POSIX__) && defined(NODE_SHARED_MODE)
  // In node::PlatformInit(), we squash all signal handlers for non-shared lib
  // build. In order to run test cases against shared lib build, we also need
  // to do the same thing for shared lib build here, but only for SIGPIPE for
  // now. If node::PlatformInit() is moved to here, then this section could be
  // removed.
  // socket一端clode的情况下,进程第二次write会触发操作系统给进程发送SIGPIPE信号,默认处理操作是关闭进程
  // SIG_IGN作为处理函数,将忽略该信号
  {
    struct sigaction act;
    memset(&act, 0, sizeof(act));
    act.sa_handler = SIG_IGN;
    sigaction(SIGPIPE, &act, nullptr);
  }
#endif

#if defined(__linux__)
  char** envp = environ;
  while (*envp++ != nullptr) {}
  Elf_auxv_t* auxv = reinterpret_cast<Elf_auxv_t*>(envp);
  for (; auxv->a_type != AT_NULL; auxv++) {
    if (auxv->a_type == AT_SECURE) {
      node::linux_at_secure = auxv->a_un.a_val;
      break;
    }
  }
#endif
  // Disable stdio buffering, it interacts poorly with printf()
  // calls elsewhere in the program (e.g., any logging from V8.)
  setvbuf(stdout, nullptr, _IONBF, 0);
  setvbuf(stderr, nullptr, _IONBF, 0);
  return node::Start(argc, argv);
}
#endif

这里主要做了三件事:

1.屏蔽SIGPIPE信号(具体可看代码注释)
2.定义node::linux_at_secure,这里是根据linux中的Elf32_auxv_t(动态链接器所需的辅助信息)
3.node::Start(argc, argv)

node::Start执行流程

node::Start代码如下:

int Start(int argc, char** argv) {
  atexit([] () { uv_tty_reset_mode(); });
  PlatformInit();
  performance::performance_node_start = PERFORMANCE_NOW();

  CHECK_GT(argc, 0);

  // Hack around with the argv pointer. Used for process.title = "blah".
  argv = uv_setup_args(argc, argv);

  // This needs to run *before* V8::Initialize().  The const_cast is not
  // optional, in case you're wondering.
  int exec_argc;
  const char** exec_argv;
  Init(&argc, const_cast<const char**>(argv), &exec_argc, &exec_argv);

#if HAVE_OPENSSL
  {
    std::string extra_ca_certs;
    if (SafeGetenv("NODE_EXTRA_CA_CERTS", &extra_ca_certs))
      crypto::UseExtraCaCerts(extra_ca_certs);
  }
#ifdef NODE_FIPS_MODE
  // In the case of FIPS builds we should make sure
  // the random source is properly initialized first.
  OPENSSL_init();
#endif  // NODE_FIPS_MODE
  // V8 on Windows doesn't have a good source of entropy. Seed it from
  // OpenSSL's pool.
  V8::SetEntropySource(crypto::EntropySource);
#endif  // HAVE_OPENSSL

  v8_platform.Initialize(v8_thread_pool_size);
  // Enable tracing when argv has --trace-events-enabled.
  v8_platform.StartTracingAgent();
  V8::Initialize();
  performance::performance_v8_start = PERFORMANCE_NOW();
  v8_initialized = true;
  const int exit_code =
      Start(uv_default_loop(), argc, argv, exec_argc, exec_argv);
  v8_platform.StopTracingAgent();
  v8_initialized = false;
  V8::Dispose();

  // uv_run cannot be called from the time before the beforeExit callback
  // runs until the program exits unless the event loop has any referenced
  // handles after beforeExit terminates. This prevents unrefed timers
  // that happen to terminate during shutdown from being run unsafely.
  // Since uv_run cannot be called, uv_async handles held by the platform
  // will never be fully cleaned up.
  v8_platform.Dispose();

  delete[] exec_argv;
  exec_argv = nullptr;

  return exit_code;
}

1.PlatformInit

inline void PlatformInit() {
#ifdef __POSIX__
#if HAVE_INSPECTOR
  // 信号集,描述信号的集合
  // 每个信号占用一位(64位)
  sigset_t sigmask;
  sigemptyset(&sigmask);
  sigaddset(&sigmask, SIGUSR1);
  // 屏蔽了除SIGUSR1外的所有信号
  // 一般按照sigdelset(&set, SIGALRM);pthread_sigmask(SIG_SETMASK, &set, NULL);方式使用
  const int err = pthread_sigmask(SIG_SETMASK, &sigmask, nullptr);
#endif  // HAVE_INSPECTOR

  // Make sure file descriptors 0-2 are valid before we start logging anything.
  for (int fd = STDIN_FILENO; fd <= STDERR_FILENO; fd += 1) {
    struct stat ignored;
    if (fstat(fd, &ignored) == 0)
      continue;
    // Anything but EBADF means something is seriously wrong.  We don't
    // have to special-case EINTR, fstat() is not interruptible.
    if (errno != EBADF)
      ABORT();
    if (fd != open("/dev/null", O_RDWR))
      ABORT();
  }

#if HAVE_INSPECTOR
  CHECK_EQ(err, 0);
#endif  // HAVE_INSPECTOR

#ifndef NODE_SHARED_MODE
  // Restore signal dispositions, the parent process may have changed them.
  struct sigaction act;
  memset(&act, 0, sizeof(act));

  // The hard-coded upper limit is because NSIG is not very reliable; on Linux,
  // it evaluates to 32, 34 or 64, depending on whether RT signals are enabled.
  // Counting up to SIGRTMIN doesn't work for the same reason.
  // 跟main中一样,忽略SIGPIPE信号
  // sigaction与pthread_sigmask区别在于线程中调用signal或者sigaction等函数会改变所有线程中的信号处理函数
  for (unsigned nr = 1; nr < kMaxSignal; nr += 1) {
    if (nr == SIGKILL || nr == SIGSTOP)
      continue;
    act.sa_handler = (nr == SIGPIPE) ? SIG_IGN : SIG_DFL;
    CHECK_EQ(0, sigaction(nr, &act, nullptr));
  }
#endif  // !NODE_SHARED_MODE

  RegisterSignalHandler(SIGINT, SignalExit, true);
  RegisterSignalHandler(SIGTERM, SignalExit, true);

  // Raise the open file descriptor limit.
  // 提高进程打开文件数量
  struct rlimit lim;
  if (getrlimit(RLIMIT_NOFILE, &lim) == 0 && lim.rlim_cur != lim.rlim_max) {
    // Do a binary search for the limit.
    rlim_t min = lim.rlim_cur;
    rlim_t max = 1 << 20;
    // But if there's a defined upper bound, don't search, just set it.
    if (lim.rlim_max != RLIM_INFINITY) {
      min = lim.rlim_max;
      max = lim.rlim_max;
    }
    do {
      lim.rlim_cur = min + (max - min) / 2;
      if (setrlimit(RLIMIT_NOFILE, &lim)) {
        max = lim.rlim_cur;
      } else {
        min = lim.rlim_cur;
      }
    } while (min + 1 < max);
  }
#endif  // __POSIX__
#ifdef _WIN32
  for (int fd = 0; fd <= 2; ++fd) {
    auto handle = reinterpret_cast<HANDLE>(_get_osfhandle(fd));
    if (handle == INVALID_HANDLE_VALUE ||
        GetFileType(handle) == FILE_TYPE_UNKNOWN) {
      // Ignore _close result. If it fails or not depends on used Windows
      // version. We will just check _open result.
      _close(fd);
      if (fd != _open("nul", _O_RDWR))
        ABORT();
    }
  }
#endif  // _WIN32
}

主要以下几件事:

1.利用pthread_sigmask阻塞了线程除SIGUSR1外的所有信号
2.利用STDIN_FILENO、STDERR_FILENO,确定标准输入、输出的文件描述符可用,已备后面去打log
3.对非共享库做信号处理,忽略SIGPIPE信号,跟上述node_main中对共享库做的操作一样
4.利用sigaction注册信号SIGINT、SIGTERM处理函数,当然处理函数是exit
5.提高进程打开文件数量

下面我将挑一些重点的点来讲解。

pthread_sigmask sigaction

pthread_sigmask用来设置线程的信号屏蔽集,注意这里是线程自己的;sigaction用来安装信号的处理函数,这里操作的进程的,进程中所有线程会共享这个出个处理函数。也就是说线程可以有自己的信号屏蔽集,但是处理函数是进程中所有线程共享的。

提高进程打开文件描述符数量

依据上述代码,我们发现其使用的是setrlimit方法,当rlimit中有max属性时,直接setrlimit;没有max属性时,从lim.rlim_cur到2的19次方之间指数递增。

2.uv_setup_args(argc, argv)

其实就是复制一份argv,返回new_argv,给process.title用。

3.Init

void Init(int* argc,
          const char** argv,
          int* exec_argc,
          const char*** exec_argv) {
  // Initialize prog_start_time to get relative uptime.
  prog_start_time = static_cast<double>(uv_now(uv_default_loop()));

  // Register built-in modules
  // 注册内置模块
  RegisterBuiltinModules();

  // Make inherited handles noninheritable.
  // disable掉继承过来的handle
  uv_disable_stdio_inheritance();

#if defined(NODE_V8_OPTIONS)
  // Should come before the call to V8::SetFlagsFromCommandLine()
  // so the user can disable a flag --foo at run-time by passing
  // --no_foo from the command line.
  // 设置v8虚拟机启动的命令行标志
  V8::SetFlagsFromString(NODE_V8_OPTIONS, sizeof(NODE_V8_OPTIONS) - 1);
#endif

  // 从环境变量中获取各种参数
  {
    std::string text;
    config_pending_deprecation =
        SafeGetenv("NODE_PENDING_DEPRECATION", &text) && text[0] == '1';
  }

  // Allow for environment set preserving symlinks.
  {
    std::string text;
    config_preserve_symlinks =
        SafeGetenv("NODE_PRESERVE_SYMLINKS", &text) && text[0] == '1';
  }

  if (config_warning_file.empty())
    SafeGetenv("NODE_REDIRECT_WARNINGS", &config_warning_file);

#if HAVE_OPENSSL
  if (openssl_config.empty())
    SafeGetenv("OPENSSL_CONF", &openssl_config);
#endif

#if !defined(NODE_WITHOUT_NODE_OPTIONS)
  std::string node_options;
  if (SafeGetenv("NODE_OPTIONS", &node_options)) {
    // Smallest tokens are 2-chars (a not space and a space), plus 2 extra
    // pointers, for the prepended executable name, and appended NULL pointer.
    size_t max_len = 2 + (node_options.length() + 1) / 2;
    const char** argv_from_env = new const char*[max_len];
    int argc_from_env = 0;
    // [0] is expected to be the program name, fill it in from the real argv.
    argv_from_env[argc_from_env++] = argv[0];

    char* cstr = strdup(node_options.c_str());
    char* initptr = cstr;
    char* token;
    while ((token = strtok(initptr, " "))) {  // NOLINT(runtime/threadsafe_fn)
      initptr = nullptr;
      argv_from_env[argc_from_env++] = token;
    }
    argv_from_env[argc_from_env] = nullptr;
    int exec_argc_;
    const char** exec_argv_ = nullptr;
    ProcessArgv(&argc_from_env, argv_from_env, &exec_argc_, &exec_argv_, true);
    delete[] exec_argv_;
    delete[] argv_from_env;
    free(cstr);
  }
#endif

  // 获取node和v8的参数
  ProcessArgv(argc, argv, exec_argc, exec_argv);

#if defined(NODE_HAVE_I18N_SUPPORT)
  // If the parameter isn't given, use the env variable.
  if (icu_data_dir.empty())
    SafeGetenv("NODE_ICU_DATA", &icu_data_dir);
  // Initialize ICU.
  // If icu_data_dir is empty here, it will load the 'minimal' data.
  if (!i18n::InitializeICUDirectory(icu_data_dir)) {
    fprintf(stderr,
            "%s: could not initialize ICU "
            "(check NODE_ICU_DATA or --icu-data-dir parameters)\n",
            argv[0]);
    exit(9);
  }
#endif

  // Needed for access to V8 intrinsics.  Disabled again during bootstrapping,
  // see lib/internal/bootstrap/node.js.
  // 允许用户代码去调用v8的内置函数
  // 调用方式以%开头,谋面大家会看见
  const char allow_natives_syntax[] = "--allow_natives_syntax";
  V8::SetFlagsFromString(allow_natives_syntax,
                         sizeof(allow_natives_syntax) - 1);

  // We should set node_is_initialized here instead of in node::Start,
  // otherwise embedders using node::Init to initialize everything will not be
  // able to set it and native modules will not load for them.
  node_is_initialized = true;
}

Init方法主要做了以下几件事:

1.注册内置模块
2.disable掉继承过来的文件描述符
3.设置v8虚拟机启动的命令行标志
4.利用getenv(),从环境变量中获取各种参数
5.获取node和v8的运行参数exec_argv
6.设置v8标志--allow_natives_syntax

还是挑几个重点讲解一下

RegisterBuiltinModules

注册内置模块,也就是src里的.cc文件。

void RegisterBuiltinModules() {
#define V(modname) _register_##modname();
  NODE_BUILTIN_MODULES(V)
#undef V
}

RegisterBuiltinModules做了两件事:

1.宏定义V
2.调用NODE_BUILTIN_MODULES

NODE_BUILTIN_MODULES也是一个宏定义,定义如下:

#define NODE_BUILTIN_MODULES(V)                                               \
  NODE_BUILTIN_STANDARD_MODULES(V)                                            \
  NODE_BUILTIN_OPENSSL_MODULES(V)                                             \
  NODE_BUILTIN_ICU_MODULES(V)

NODE_BUILTIN_STANDARD_MODULES定义如下:

#define NODE_BUILTIN_STANDARD_MODULES(V)                                      \
    V(async_wrap)                                                             \
    V(buffer)                                                                 \
    V(cares_wrap)                                                             \
	......

也就是注册每个模块,其实调用了_register_##modname()。

_register_##modname()定义如下:

static node::node_module _module = {                                        \
    NODE_MODULE_VERSION,                                                      \
    flags,                                                                    \
    nullptr,                                                                  \
    __FILE__,                                                                 \
    nullptr,                                                                  \
    (node::addon_context_register_func) (regfunc),                            \
    NODE_STRINGIFY(modname),                                                  \
    priv,                                                                     \
    nullptr                                                                   \
  };                                                                          \
  void _register_ ## modname() {                                              \
    node_module_register(&_module);                                           \
  }

node_module_register定义在src/node.cc中,源码如下:

extern "C" void node_module_register(void* m) {
  struct node_module* mp = reinterpret_cast<struct node_module*>(m);

  if (mp->nm_flags & NM_F_BUILTIN) {
    mp->nm_link = modlist_builtin;
    modlist_builtin = mp;
  } else if (mp->nm_flags & NM_F_INTERNAL) {
    mp->nm_link = modlist_internal;
    modlist_internal = mp;
  } else if (!node_is_initialized) {
    // "Linked" modules are included as part of the node project.
    // Like builtins they are registered *before* node::Init runs.
    mp->nm_flags = NM_F_LINKED;
    mp->nm_link = modlist_linked;
    modlist_linked = mp;
  } else {
    modpending = mp;
  }
}

其实就是把上面定义的module加到了modlist_builtin链表里。

uv_disable_stdio_inheritance

void uv_disable_stdio_inheritance(void) {
  int fd;

  /* Set the CLOEXEC flag on all open descriptors. Unconditionally try the
   * first 16 file descriptors. After that, bail out after the first error.
   */
  for (fd = 0; ; fd++)
    if (uv__cloexec(fd, 1) && fd > 15)
      break;
}

其实就是利用了cloexec,在子进程执行时,关闭相应文件描述符。这里多说几句,为什么要这样呢?原因在于当fork子进程时,会将父进程文件描述符及堆栈信息复制到子进程,但当子进程执行时,原有执行栈被重置,原有的文件描述符对应变量也就不见了,所以将无法关闭对应文件描述符。cloexec就是为了解决这个问题的,在子进程执行时,关闭文件描述符。

--allow_natives_syntax

V8通过设置--allow_natives_syntax来允许用户的代码调用v8的内置函数,但调用时要以%开头。

4.判断OPENSSL

#if HAVE_OPENSSL
  {
    std::string extra_ca_certs;
    if (SafeGetenv("NODE_EXTRA_CA_CERTS", &extra_ca_certs))
      crypto::UseExtraCaCerts(extra_ca_certs);
  }

主要判断是否需要openssl,如果需要从NODE_EXTRA_CA_CERTS中取证书。

5.v8_platform.Initialize

void Initialize(int thread_pool_size) {
    tracing_agent_.reset(new tracing::Agent(trace_file_pattern));
    platform_ = new NodePlatform(thread_pool_size,
        tracing_agent_->GetTracingController());
    V8::InitializePlatform(platform_);
    tracing::TraceEventHelper::SetTracingController(
        tracing_agent_->GetTracingController());
  }

主要对V8做了线程池容积的初始化。

6.V8::Initialize();

这里是v8的初始化,定义再src/deps/v8/src/v8.cc中,

bool V8::Initialize() {
  InitializeOncePerProcess();
  return true;
}

InitializeOncePerProcess做了什么呢?

void V8::InitializeOncePerProcess() {
  base::CallOnce(&init_once, &InitializeOncePerProcessImpl);
}

CallOnce

CallOnce顾名思义就是只调用一次,其通过判断init_once是否为ONCE_STATE_DONE来判断是否曾经调用过。

inline void CallOnce(OnceType* once, NoArgFunction init_func) {
  if (Acquire_Load(once) != ONCE_STATE_DONE) {
    CallOnceImpl(once, init_func);
  }
}

其中Acquire_Load为原子性的获取once的值,CallOnceImpl则再其中修改once值,并且执行init_func。

下面我们看下Acquire_Load的定义:

inline Atomic32 Acquire_Load(volatile const Atomic32* ptr) {
  return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
}

__atomic_load_n即为原子性的加载ptr指针所指向的内存所存储的变量。

CallOnceImpl代码如下:

if (state == ONCE_STATE_UNINITIALIZED) {
    // We are the first thread to call this function, so we have to call the
    // function.
    init_func();
    Release_Store(once, ONCE_STATE_DONE);

主要做了两件事:

1.执行init_func
2.原子性的设置once的值,表明在该进程中,已经执行过了。

InitializeOncePerProcessImpl

void V8::InitializeOncePerProcessImpl() {
  FlagList::EnforceFlagImplications();

  if (FLAG_predictable && FLAG_random_seed == 0) {
    // Avoid random seeds in predictable mode.
    FLAG_random_seed = 12347;
  }

  if (FLAG_stress_compaction) {
    FLAG_force_marking_deque_overflows = true;
    FLAG_gc_global = true;
    FLAG_max_semi_space_size = 1;
  }

  base::OS::Initialize(FLAG_hard_abort, FLAG_gc_fake_mmap);

  if (FLAG_random_seed) SetRandomMmapSeed(FLAG_random_seed);

  // 初始化线程
  // 创建TLS,thread_table_data等
  Isolate::InitializeOncePerProcess();

#if defined(USE_SIMULATOR)
  Simulator::InitializeOncePerProcess();
#endif
  sampler::Sampler::SetUp();
  CpuFeatures::Probe(false);
  ElementsAccessor::InitializeOncePerProcess();
  ExternalReference::SetUp();
  Bootstrapper::InitializeOncePerProcess();
}

这里主要做了两件事:

1.操作系统相关的初始化
2.初始化线程,创建TLS,thread_table_data等

Isolate::InitializeOncePerProcess

void Isolate::InitializeOncePerProcess() {
  // 管理互斥锁(二元信号量),lock_guard类似智能指针
  // 栈销毁时析构
  // A lock guard is an object that manages a mutex object by keeping it always locked.
  base::LockGuard<base::Mutex> lock_guard(thread_data_table_mutex_.Pointer());
  CHECK_NULL(thread_data_table_);
  // pthread_create_key()
  // 线程局部存储,TLSaloc();
  isolate_key_ = base::Thread::CreateThreadLocalKey();
#if DEBUG
  base::Relaxed_Store(&isolate_key_created_, 1);
#endif
  thread_id_key_ = base::Thread::CreateThreadLocalKey();
  per_isolate_thread_data_key_ = base::Thread::CreateThreadLocalKey();
  // ThreadDataTable为list链表
  thread_data_table_ = new Isolate::ThreadDataTable();
}

主要做了三件事:

1.加互斥锁
2.利用TLSaloc申请线程本地存储
3.创建thread_data_table_链表

7.Start(uv_default_loop(), argc, argv, exec_argc, exec_argv)

inline int Start(uv_loop_t* event_loop,
                 int argc, const char* const* argv,
                 int exec_argc, const char* const* exec_argv) {
  Isolate::CreateParams params;
  // BufferAllocator,node中buffer不会占用V8分配的内存,而是直接从堆中申请
  ArrayBufferAllocator allocator;
  params.array_buffer_allocator = &allocator;
#ifdef NODE_ENABLE_VTUNE_PROFILING
  params.code_event_handler = vTune::GetVtuneCodeEventHandler();
#endif

  Isolate* const isolate = Isolate::New(params);
  if (isolate == nullptr)
    return 12;  // Signal internal error.

  // 给isolate添加监听处理函数,这里监听message级别
  isolate->AddMessageListener(OnMessage);
  isolate->SetAbortOnUncaughtExceptionCallback(ShouldAbortOnUncaughtException);
  isolate->SetMicrotasksPolicy(v8::MicrotasksPolicy::kExplicit);
  isolate->SetFatalErrorHandler(OnFatalError);
  isolate->SetAllowWasmCodeGenerationCallback(AllowWasmCodeGenerationCallback);

  {
    // lock_guard的升级版本
    Mutex::ScopedLock scoped_lock(node_isolate_mutex);
    CHECK_EQ(node_isolate, nullptr);
    node_isolate = isolate;
  }

  int exit_code;
  {
    // 加互斥锁,因为isolate不是线程安全的
    Locker locker(isolate);
    Isolate::Scope isolate_scope(isolate);
    HandleScope handle_scope(isolate);
    IsolateData isolate_data(
        isolate,
        event_loop,
        v8_platform.Platform(),
        allocator.zero_fill_field());
    if (track_heap_objects) {
      isolate->GetHeapProfiler()->StartTrackingHeapObjects(true);
    }
    exit_code = Start(isolate, &isolate_data, argc, argv, exec_argc, exec_argv);
  }

  {
    Mutex::ScopedLock scoped_lock(node_isolate_mutex);
    CHECK_EQ(node_isolate, isolate);
    node_isolate = nullptr;
  }

  isolate->Dispose();

  return exit_code;
}

这里主要做了如下几件事:

1.初始化isolate的params,这里需要注意的是array_buffer_allocator,设置这个分配器是为了分配buffer时使用,node中buffer不会占用V8的内存,而是直接从堆中申请,这也是buffer不受v8内存限制的原因
2.创建Isolate
3.给Isolate添加监听回调
4.Start(isolate, &isolate_data, argc, argv, exec_argc, exec_argv)

array_buffer_allocator

ArrayBufferAllocator::Allocate其实就是调用了realloc,在原来基础上将pointer所指向的内存大小增加到full_size。

allocated = realloc(pointer, full_size);

Isolate添加监听回调

以AddMessageListener为例,其实最终调用的是Isolate::AddMessageListenerWithErrorLevel,代码如下:

bool Isolate::AddMessageListenerWithErrorLevel(MessageCallback that,
                                               int message_levels,
                                               Local<Value> data) {
  i::Isolate* isolate = reinterpret_cast<i::Isolate*>(this);
  ENTER_V8_NO_SCRIPT_NO_EXCEPTION(isolate);
  i::HandleScope scope(isolate);
  i::Handle<i::TemplateList> list = isolate->factory()->message_listeners();
  i::Handle<i::FixedArray> listener = isolate->factory()->NewFixedArray(3);
  i::Handle<i::Foreign> foreign =
      isolate->factory()->NewForeign(FUNCTION_ADDR(that));
  listener->set(0, *foreign);
  listener->set(1, data.IsEmpty() ? isolate->heap()->undefined_value()
                                  : *Utils::OpenHandle(*data));
  listener->set(2, i::Smi::FromInt(message_levels));
  list = i::TemplateList::Add(isolate, list, listener);
  isolate->heap()->SetMessageListeners(*list);
  return true;
}

其实就是给堆内存增加了监听,在message_listeners中加入对应listener。

8.Start(isolate, &isolate_data, argc, argv, exec_argc, exec_argv)

inline int Start(Isolate* isolate, IsolateData* isolate_data,
                 int argc, const char* const* argv,
                 int exec_argc, const char* const* exec_argv) {
  HandleScope handle_scope(isolate);
  Local<Context> context = NewContext(isolate);
  Context::Scope context_scope(context);
  Environment env(isolate_data, context, v8_platform.GetTracingAgent());
  // 初始化uv handle、process
  env.Start(argc, argv, exec_argc, exec_argv, v8_is_profiling);

  const char* path = argc > 1 ? argv[1] : nullptr;
  StartInspector(&env, path, debug_options);

  if (debug_options.inspector_enabled() && !v8_platform.InspectorStarted(&env))
    return 12;  // Signal internal error.

  env.set_abort_on_uncaught_exception(abort_on_uncaught_exception);

  if (no_force_async_hooks_checks) {
    env.async_hooks()->no_force_checks();
  }

  {
    Environment::AsyncCallbackScope callback_scope(&env);
    env.async_hooks()->push_async_ids(1, 0);
    LoadEnvironment(&env);
    env.async_hooks()->pop_async_id(1);
  }
  
  ......
}

这里主要做了如下几件事:

1.调用env.Start()来初始化uv handle、process
2.LoadEnvironment()

LoadEnvironment()

void LoadEnvironment(Environment* env) {
  HandleScope handle_scope(env->isolate());

  TryCatch try_catch(env->isolate());
  // Disable verbose mode to stop FatalException() handler from trying
  // to handle the exception. Errors this early in the start-up phase
  // are not safe to ignore.
  try_catch.SetVerbose(false);

  // The bootstrapper scripts are lib/internal/bootstrap/loaders.js and
  // lib/internal/bootstrap/node.js, each included as a static C string
  // defined in node_javascript.h, generated in node_javascript.cc by
  // node_js2c.
  Local<String> loaders_name =
      FIXED_ONE_BYTE_STRING(env->isolate(), "internal/bootstrap/loaders.js");
  // LoadersBootstrapperSource从node_js2c中获取loaders.js的ascII源码
  Local<Function> loaders_bootstrapper =
      GetBootstrapper(env, LoadersBootstrapperSource(env), loaders_name);
  Local<String> node_name =
      FIXED_ONE_BYTE_STRING(env->isolate(), "internal/bootstrap/node.js");
  Local<Function> node_bootstrapper =
      GetBootstrapper(env, NodeBootstrapperSource(env), node_name);

  // Add a reference to the global object
  Local<Object> global = env->context()->Global();

#if defined HAVE_DTRACE || defined HAVE_ETW
  InitDTrace(env, global);
#endif

#if defined HAVE_PERFCTR
  InitPerfCounters(env, global);
#endif

  // Enable handling of uncaught exceptions
  // (FatalException(), break on uncaught exception in debugger)
  //
  // This is not strictly necessary since it's almost impossible
  // to attach the debugger fast enough to break on exception
  // thrown during process startup.
  try_catch.SetVerbose(true);

  env->SetMethod(env->process_object(), "_rawDebug", RawDebug);

  // Expose the global object as a property on itself
  // (Allows you to set stuff on `global` from anywhere in JavaScript.)
  global->Set(FIXED_ONE_BYTE_STRING(env->isolate(), "global"), global);

  // Create binding loaders
  v8::Local<v8::Function> get_binding_fn =
      env->NewFunctionTemplate(GetBinding)->GetFunction(env->context())
          .ToLocalChecked();

  v8::Local<v8::Function> get_linked_binding_fn =
      env->NewFunctionTemplate(GetLinkedBinding)->GetFunction(env->context())
          .ToLocalChecked();

  v8::Local<v8::Function> get_internal_binding_fn =
      env->NewFunctionTemplate(GetInternalBinding)->GetFunction(env->context())
          .ToLocalChecked();

  Local<Value> loaders_bootstrapper_args[] = {
    env->process_object(),
    get_binding_fn,
    get_linked_binding_fn,
    get_internal_binding_fn
  };

  // Bootstrap internal loaders
  Local<Value> bootstrapped_loaders;
  if (!ExecuteBootstrapper(env, loaders_bootstrapper,
                           arraysize(loaders_bootstrapper_args),
                           loaders_bootstrapper_args,
                           &bootstrapped_loaders)) {
    return;
  }

  // Bootstrap Node.js
  Local<Value> bootstrapped_node;
  // bootstrapped_loaders中是loaders_bootstrapper执行返回的{ internalBinding, NativeModule }
  Local<Value> node_bootstrapper_args[] = {
    env->process_object(),
    bootstrapped_loaders
  };
  if (!ExecuteBootstrapper(env, node_bootstrapper,
                           arraysize(node_bootstrapper_args),
                           node_bootstrapper_args,
                           &bootstrapped_node)) {
    return;
  }
}

这里主要做了以下几件事:

1.从node_javascript.cc中获取node.js、loaders.js的ascII源码,这里的node_javascript.cc在[上一篇文章](https://github.com/tsy77/blog/issues/6)中有过简单介绍,通过js2c.py将./lib中所有js文件的ascII码存入node_javascript.cc中。
2.创建v8::Local<v8::Function> get_binding_fn、get_linked_binding_fn、get_internal_binding_fn
3.执行loader.js和node.js。在node.js中,运行了我们想要执行的js文件。

node_js2c

下面便是node_javascript.cc中的一部分:

static const uint8_t raw_internal_bootstrap_loaders_key[] = { 105,110,116,101,114,110,97,108,47,98,111,111,116,115,116,114,97,112,47,108,
111,97,100,101,114,115 };
static struct : public v8::String::ExternalOneByteStringResource {
  const char* data() const override {
    return reinterpret_cast<const char*>(raw_internal_bootstrap_loaders_key);
  }
  size_t length() const override { return arraysize(raw_internal_bootstrap_loaders_key); }
  void Dispose() override { /* Default calls `delete this`. */ }
  v8::Local<v8::String> ToStringChecked(v8::Isolate* isolate) {
    return v8::String::NewExternalOneByte(isolate, this).ToLocalChecked();
  }
} internal_bootstrap_loaders_key;

static const uint8_t raw_internal_bootstrap_loaders_value[] = { 47,47,32,84,104,105,115,32,102,105,108,101,32,99,114,101,97,116,101,115,
32,116,104,101,32,105,110,116,101,114,110,97,108,32,109,111,100,117,108,101,
32,38,32,98,105,110,100,105,110,103,32,108,111,97,100,101,114,115,32,117,
115,101,100,32,98,121,32,98,117,105,108,116,45,105,110,10,47,47,32,109,
111,100,117,108,101,115,46,32,73,110,32,99,111,110,116,114,97,115,116,44,
32,117,115,101,114,32,108,97,110,100,32,109,111,100,117,108,101,115,32,97,
114,101,32,108,111,97,100,101,100, };
static struct : public v8::String::ExternalOneByteStringResource {
  const char* data() const override {
    return reinterpret_cast<const char*>(raw_internal_bootstrap_loaders_value);
  }
  size_t length() const override { return arraysize(raw_internal_bootstrap_loaders_value); }
  void Dispose() override { /* Default calls `delete this`. */ }
  v8::Local<v8::String> ToStringChecked(v8::Isolate* isolate) {
    return v8::String::NewExternalOneByte(isolate, this).ToLocalChecked();
  }
} internal_bootstrap_loaders_value;

我们可以看到两个数组和两个struct,其中raw_internal_bootstrap_loaders_key和raw_internal_bootstrap_loaders_value分别记录bootstrap_loaders的key和value(文件内容),两个结构体internal_bootstrap_loaders_key和internal_bootstrap_loaders_value均有方法ToStringChecked,而ToStringChecked其实会去找data()方法,也就是说internal_bootstrap_loaders_value.ToStringChecked()便会返回对应的ascII码。

node_javascript.cc又是如何产生的呢?

{
      'target_name': 'node_js2c',
      'type': 'none',
      'toolsets': ['host'],
      'actions': [
        {
          'action_name': 'node_js2c',
          'process_outputs_as_sources': 1,
          'inputs': [
            '<@(library_files)',
            './config.gypi',
            'tools/check_macros.py'
          ],
          'outputs': [
            '<(SHARED_INTERMEDIATE_DIR)/node_javascript.cc',
          ],
          'conditions': [
            [ 'node_use_dtrace=="false" and node_use_etw=="false"', {
              'inputs': [ 'src/notrace_macros.py' ]
            }],
            [ 'node_use_perfctr=="false"', {
              'inputs': [ 'src/noperfctr_macros.py' ]
            }],
            [ 'node_debug_lib=="false"', {
              'inputs': [ 'tools/nodcheck_macros.py' ]
            }],
            [ 'node_debug_lib=="true"', {
              'inputs': [ 'tools/dcheck_macros.py' ]
            }]
          ],
          'action': [
            'python',
            'tools/js2c.py',
            '<@(_outputs)',
            '<@(_inputs)',
          ],
        },
      ],

我看看到在node.gyp中定义了action,其实就是调用了python tools/js2c.py,这个后面文章再来介绍吧,这里先简单提一下。

GetBinding

getBinding又是干什么的呢?

static void GetBinding(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);

  CHECK(args[0]->IsString());

  Local<String> module = args[0].As<String>();
  node::Utf8Value module_v(env->isolate(), module);

  node_module* mod = get_builtin_module(*module_v);
  Local<Object> exports;
  if (mod != nullptr) {
    exports = InitModule(env, mod, module);
  } else if (!strcmp(*module_v, "constants")) {
    exports = Object::New(env->isolate());
    CHECK(exports->SetPrototype(env->context(),
                                Null(env->isolate())).FromJust());
    DefineConstants(env->isolate(), exports);
  } else if (!strcmp(*module_v, "natives")) {
    exports = Object::New(env->isolate());
    DefineJavaScript(env, exports);
  } else {
    return ThrowIfNoSuchModule(env, *module_v);
  }

  args.GetReturnValue().Set(exports);
}

我们不难发现,逻辑上有三个分叉:

1. get_builtin_module,获取buildin模块,如果获取到了(是buildin模块),exports = InitModule(env, mod, module);
2.如果是常量,DefineConstants
3.如果是natives,DefineJavaScript

get_builtin_module又是怎么做的呢?

node_module* get_builtin_module(const char* name) {
  return FindModule(modlist_builtin, name, NM_F_BUILTIN);
}

inline struct node_module* FindModule(struct node_module* list,
                                      const char* name,
                                      int flag) {
  struct node_module* mp;

  for (mp = list; mp != nullptr; mp = mp->nm_link) {
    if (strcmp(mp->nm_modname, name) == 0)
      break;
  }

  CHECK(mp == nullptr || (mp->nm_flags & flag) != 0);
  return mp;
}

很简单,就是从modlist_builtin里面遍历,上述的Init函数中调用RegisterBuiltinModules将所有的内置模块加入到链表modlist_builtin中。

InitModule其实就是执行了module::Initialize(),以async_wrap为例:

oid AsyncWrap::Initialize(Local<Object> target,
                           Local<Value> unused,
                           Local<Context> context) {
  Environment* env = Environment::GetCurrent(context);
  Isolate* isolate = env->isolate();
  HandleScope scope(isolate);

  env->BeforeExit(DestroyAsyncIdsCallback, env);

  env->SetMethod(target, "setupHooks", SetupHooks);
  env->SetMethod(target, "pushAsyncIds", PushAsyncIds);
  env->SetMethod(target, "popAsyncIds", PopAsyncIds);
  env->SetMethod(target, "queueDestroyAsyncId", QueueDestroyAsyncId);
  env->SetMethod(target, "enablePromiseHook", EnablePromiseHook);
  env->SetMethod(target, "disablePromiseHook", DisablePromiseHook);
  env->SetMethod(target, "registerDestroyHook", RegisterDestroyHook);

  ......

  env->set_async_hooks_init_function(Local<Function>());
  env->set_async_hooks_before_function(Local<Function>());
  env->set_async_hooks_after_function(Local<Function>());
  env->set_async_hooks_destroy_function(Local<Function>());
  env->set_async_hooks_promise_resolve_function(Local<Function>());
  env->set_async_hooks_binding(target);
}

上述async_wrap中可以看到其实就是在exports上挂载各种方法,然后初始化。

DefineJavaScript干了什么呢?

CHECK(target->Set(env->context(),
                  internal_bootstrap_loaders_key.ToStringChecked(env->isolate()),
                  internal_bootstrap_loaders_value.ToStringChecked(env->isolate())).FromJust());

我们看到其实就是将node_javascript.cc中的模块以key/value的形式挂载到exports,这里可以注意下上面提到的ToStringChecked。

ExecuteBootstrapper

这里就是执行internal/loader.js和internal/node.js,这里先简单讲下,后面会做详细介绍。其最主要的逻辑如下:

if (process._syntax_check_only != null) {
          const fs = NativeModule.require('fs');
          // read the source
          const filename = CJSModule._resolveFilename(process.argv[1]);
          const source = fs.readFileSync(filename, 'utf-8');
          checkScriptSyntax(source, filename);
          process.exit(0);
        }
        CJSModule.runMain();

检测语法,然后执行。

8.资源释放

v8_platform.StopTracingAgent();
  v8_initialized = false;
  V8::Dispose();

  // uv_run cannot be called from the time before the beforeExit callback
  // runs until the program exits unless the event loop has any referenced
  // handles after beforeExit terminates. This prevents unrefed timers
  // that happen to terminate during shutdown from being run unsafely.
  // Since uv_run cannot be called, uv_async handles held by the platform
  // will never be fully cleaned up.
  v8_platform.Dispose();

  delete[] exec_argv;
  exec_argv = nullptr;

  return exit_code;

这里把v8_platform、exec_argv等资源释放,此次运行结束。

总结

本次主要沿着node::Start函数的逻辑,将运行一个node程序完整的流程呈现给大家,后面会对其中涉及的一些点以及一些模块进行分别介绍。

如何知根知底使用Node.js C++ Addon

最初的想法是去写一个node C++ addon,但在了解如何去写node addon的过程中,发现想要知根知底的去使用node addon,需要对node架构、V8、libuv、模块加载等要点都有一个了解。所以本文的目的就是梳理如何清楚的使用node addon?

本文将从以下几点进行阐述:

1.node 架构
2.V8
3.libuv
4.深入node源码了解其模块加载
5.addon

node架构

node的架构相信大家都不陌生,以模块加载为角度架构图如下:

V8 engine是Google开发的javascript引擎,是一个独立运行的虚拟机,node以第三方依赖的形式引入V8(与libuv等依赖放在deps目录下)。除了作为Javascript运行引擎外,V8提供了嵌入API,为编译和执行JS脚本, 访问 C++ 方法和数据结构, 错误处理, 开启安全检查等提供了函数接口,承担着是node中js与C++桥接的重要作用。

libuv是专门为node开发的库,提供跨平台的异步I/O能力。其基于异步的、事件驱动模型,提供一个event-loop,还有基于I/O和其它事件通知的回调函数。

Builtin modules是node提供的C++模块。

Native module是node提供的js模块,其被使用者直接调用,并且有些Native module会借助下层的Builtin module。在native模块中使用builtin模块,利用的是node提供的process.binding方法(后面模块加载会有介绍)。

Addon是一个用C++写的node的动态链接库,使用者可以直接使用require()方法进行加载,当然前提是addon已经编译好。Addon主要用来扩展node的底层能力,具体应用可能是计算密集型的模块(C++的运行性能高,可以利用libuv异步和事件循环的能力,同时可以使用多进程、多线程)。

V8

V8提供了对外的使用API,可以参考V8嵌入指南

下面主要对其中主要概念进行简要梳理

Isolate是一个独立的V8实例,也可以说一个独立虚拟机,其中可以包含一个或多个线程,但同一时间,只有一个线程是执行状态。

Context代表一个执行上下文(执行环境),它使得可以在一个 V8 实例中运行相互隔离且无关的 JavaScript 代码. 你必须为你将要执行的 JavaScript 代码显式的指定一个 context。Context支持嵌套。

Handle是一个指向堆内存的指针,在V8中JavaScript的值和对象也都存放在堆中,Handle提供了一个JS对象在堆内存中的地址的引用。有人会有疑问我们直接操作JS变量指针不可以嘛?由于V8的GC策略,可能会对堆中的JS变量移动其内存位置,Handle的出现可以跟踪相应变量的地址。

Handle Scope是一个Handle的容器,为了解决一个个释放handle过于繁琐,将一些handle接入handle scope中,方便统一管理(释放等)。

下图主要是为了大家理解Isolate、Context、Handle Scope、Handle的大小关系,在细节上不够准确。

libuv

libuv是一个跨平台的异步I/O库。其架构图如下:

上图的左侧是网络相关的I/O,使用的都是各个平台比较有效率的多路I/O模型,Linux上的epoll,OSX和BSD类OS上的kqueue,SunOS上的event ports以及Windows上的IOCP机制。

右侧File类型的I/O,基于线程池的方式来实现异步的请求和处理。

具体讲解可参考libuv 教程

node 模块加载

node模块可分为Native Module、Builtin Module、Constants。

Native Module在下载node源码并编译后,会在out/Release/obj/gen目录下node_natives.h。该文件由 js2c.py 生成,其会将node源代码中的lib目录下所有js文件以及src目录下的node.js文件中每一个字符转换成对应的ASCII码,并存放在相应的数组里面。

Builtin模块会被main()之前加载到modlist_builtin中,当使用时,从链表中将模块取出即可。在每个builtin模块中,都会通过宏NODE_BUILTIN_MODULE_CONTEXT_AWARE预编译阶段将其转为函数_register_ ## modname,函数会调用node_module_register方法将其加载进modlist_builtin。tcp_wrap中宏定义如下:

NODE_BUILTIN_MODULE_CONTEXT_AWARE(tcp_wrap, node::TCPWrap::Initialize)

模块加载

我们加载node C++ addon时,可以直接使用process.binding方法。其实process.binding是node require()的基础,所以后面也介绍了require的实现

process.binding()

process.binding()做了什么呢?
static void GetBinding(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);

  CHECK(args[0]->IsString());

  Local<String> module = args[0].As<String>();
  node::Utf8Value module_v(env->isolate(), module);

  node_module* mod = get_builtin_module(*module_v);
  Local<Object> exports;
  if (mod != nullptr) {
    exports = InitModule(env, mod, module);
  } else if (!strcmp(*module_v, "constants")) {
    exports = Object::New(env->isolate());
    CHECK(exports->SetPrototype(env->context(),
                                Null(env->isolate())).FromJust());
    DefineConstants(env->isolate(), exports);
  } else if (!strcmp(*module_v, "natives")) {
    exports = Object::New(env->isolate());
    DefineJavaScript(env, exports);
  } else {
    return ThrowIfNoSuchModule(env, *module_v);
  }

  args.GetReturnValue().Set(exports);
}

static Local<Object> InitModule(Environment* env,
                                 node_module* mod,
                                 Local<String> module) {
  Local<Object> exports = Object::New(env->isolate());
  // Internal bindings don't have a "module" object, only exports.
  CHECK_EQ(mod->nm_register_func, nullptr);
  CHECK_NE(mod->nm_context_register_func, nullptr);
  Local<Value> unused = Undefined(env->isolate());
  mod->nm_context_register_func(exports,
                                unused,
                                env->context(),
                                mod->nm_priv);
  return exports;
}

process.binding()主要做了对不同类型的模块做了不同的处理:

1.Builtin模块,直接从modlist_builtin获取
2.constants模块,通过 constants 导出。
3.Native模块,从node_natives.h中获取

require

首先node中require()方法做了什么呢?
// Loads a module at the given file path. Returns that module's
// `exports` property.
Module.prototype.require = function(id) {
  if (typeof id !== 'string') {
    throw new ERR_INVALID_ARG_TYPE('id', 'string', id);
  }
  if (id === '') {
    throw new ERR_INVALID_ARG_VALUE('id', id,
                                    'must be a non-empty string');
  }
  return Module._load(id, this, /* isMain */ false);
};
代码前面时一些path校验,那么Module._load做了什么呢?
// Check the cache for the requested file.
// 1. If a module already exists in the cache: return its exports object.
// 2. If the module is native: call `NativeModule.require()` with the
//    filename and return the result.
// 3. Otherwise, create a new module for the file and save it to the cache.
//    Then have it load  the file contents before returning its exports
//    object.
Module._load = function(request, parent, isMain) {
  if (parent) {
    debug('Module._load REQUEST %s parent: %s', request, parent.id);
  }

  if (experimentalModules && isMain) {
    asyncESM.loaderPromise.then((loader) => {
      return loader.import(getURLFromFilePath(request).pathname);
    })
    .catch((e) => {
      decorateErrorStack(e);
      console.error(e);
      process.exit(1);
    });
    return;
  }

  var filename = Module._resolveFilename(request, parent, isMain);

  var cachedModule = Module._cache[filename];
  // 如果在缓存
  if (cachedModule) {
    updateChildren(parent, cachedModule, true);
    return cachedModule.exports;
  }
  
  // 原生模块
  if (NativeModule.nonInternalExists(filename)) {
    debug('load native module %s', request);
    return NativeModule.require(filename);
  }
	
  // 创建新module
  // Don't call updateChildren(), Module constructor already does.
  var module = new Module(filename, parent);

  if (isMain) {
    process.mainModule = module;
    module.id = '.';
  }

  Module._cache[filename] = module;

  tryModuleLoad(module, filename);

  return module.exports;
};

Module._load()主要做了三件事:

1.缓存模块直接从缓存取
2.原生模块调用`NativeModule.require`
3.否则,创建新模块,加入缓存
我们再深度遍历代码到NativeModule.require
NativeModule.require = function(id) {
    if (id === loaderId) {
      return loaderExports;
    }

    const cached = NativeModule.getCached(id);
    // 判断是否缓存
    if (cached && (cached.loaded || cached.loading)) {
      return cached.exports;
    }

    if (!NativeModule.exists(id)) {
      // Model the error off the internal/errors.js model, but
      // do not use that module given that it could actually be
      // the one causing the error if there's a bug in Node.js
      // eslint-disable-next-line no-restricted-syntax
      const err = new Error(`No such built-in module: ${id}`);
      err.code = 'ERR_UNKNOWN_BUILTIN_MODULE';
      err.name = 'Error [ERR_UNKNOWN_BUILTIN_MODULE]';
      throw err;
    }

    moduleLoadList.push(`NativeModule ${id}`);

    const nativeModule = new NativeModule(id);

    nativeModule.cache();
    nativeModule.compile();

    return nativeModule.exports;
 };

NativeModule.require主要做了两件事:

1.缓存模块直接从缓存取
2.否则,加入到moduleLoadList(bootstrapInternalLoaders中的私有变量)数组中,创建新的 NativeModule 对象,缓存,最后`nativeModule.compile()`。
nativeModule.compile()做了什么呢?
NativeModule.getSource = function(id) {
  return NativeModule._source[id];
};

NativeModule.wrap = function(script) {
  return NativeModule.wrapper[0] + script + NativeModule.wrapper[1];
};

NativeModule.wrapper = ['(function (exports, require, module, __filename, __dirname) {','\n});' ];

NativeModule.prototype.compile = function() {
    let source = NativeModule.getSource(this.id);
    source = NativeModule.wrap(source);

    this.loading = true;

    try {
      const script = new ContextifyScript(source, this.filename);
      // Arguments: timeout, displayErrors, breakOnSigint
      const fn = script.runInThisContext(-1, true, false);
      const requireFn = this.id.startsWith('internal/deps/') ?
        NativeModule.requireForDeps :
        NativeModule.require;
      fn(this.exports, requireFn, this, process);

      this.loaded = true;
    } finally {
      this.loading = false;
    }
};

nativeModule.compile()就是将源码wrap起来,使用script.runInThisContext 去运行。

script.runInThisContext 做了什么呢?
const {
  ContextifyScript,
  kParsingContext,
  makeContext,
  isContext: _isContext,
} = process.binding('contextify');

class Script extends ContextifyScript {...}

function createScript(code, options) {
  return new Script(code, options);
}

function runInThisContext(code, options) {
  if (typeof options === 'string') {
    options = { filename: options };
  }
  return createScript(code, options).runInThisContext(options);
}

Contextify中的runInThisText如何实现的呢?

static void RunInThisContext(const FunctionCallbackInfo<Value>& args) {
    Environment* env = Environment::GetCurrent(args);

    CHECK_EQ(args.Length(), 3);

    CHECK(args[0]->IsNumber());
    int64_t timeout = args[0]->IntegerValue(env->context()).FromJust();

    CHECK(args[1]->IsBoolean());
    bool display_errors = args[1]->IsTrue();

    CHECK(args[2]->IsBoolean());
    bool break_on_sigint = args[2]->IsTrue();

    // Do the eval within this context
    EvalMachine(env, timeout, display_errors, break_on_sigint, args);
}

Node Addon

加载

上面讲解了node的模块加载,那么Node Addon为什么能够正确加载呢?

原因在于每个Node Addon模块入口中,需要#include <node.h>node.h包含者一些宏定义,其中有:

#define NODE_MODULE(modname, regfunc)                                 \
  NODE_MODULE_X(modname, regfunc, NULL, 0)  // NOLINT (readability/null_usage)
  
 #define NODE_MODULE_X(modname, regfunc, priv, flags)                  \
  extern "C" {                                                        \
    static node::node_module _module =                                \
    {                                                                 \
      NODE_MODULE_VERSION,                                            \
      flags,                                                          \
      NULL,  /* NOLINT (readability/null_usage) */                    \
      __FILE__,                                                       \
      (node::addon_register_func) (regfunc),                          \
      NULL,  /* NOLINT (readability/null_usage) */                    \
      NODE_STRINGIFY(modname),                                        \
      priv,                                                           \
      NULL   /* NOLINT (readability/null_usage) */                    \
    };                                                                \
    NODE_C_CTOR(_register_ ## modname) {                              \
      node_module_register(&_module);                                 \
    }                                                                 \
  }

我们又看到了我们熟悉的node_module_register方法,我们再写一个addon时调用的NODE_MODULE方法,实际上就是将该模块编译后的结果加入到modlist_builtin,我们调用process.binding()就可以将其加载。

NAN

为什么要有NAN呢?

原因在于随着Node.js和V8的版本迭代,其底层API可能会发生变化,我们写的这些原生模块又依赖了变化了的API的话,包就作废了。除非包的维护者去支持新版的API,不过这样依赖,老版Node.js下就又无法编译通过新版的包了。

为了解决这种尴尬的局面,NAN出现了,其在nan.h中定义了许多判断宏,会判断当前node版本,我们使用nan.h宏定义中的方法,编译器会将其展开成不同的结果。

Node v8.0之后,官方推出了N-API,它与NAN的区别在于,NAN在适配不同版本时,需要每次重新编译,而N-API将其底层接口抽象,所有版本都适用一套API即可。

Node.js源码-setTimeout&setImmediate&process.nextTick

本文将讲解setTimeout、setImmediate和process.nextTick的实现,其中有一些部分与上一篇讲解的事件循环关联或依赖。

setTimeout

首先setTimeout在lib/internal/bootstrap/node.js中被初始化,代码如下:

function setupGlobalTimeouts() {
    const timers = NativeModule.require('timers');
    global.clearImmediate = timers.clearImmediate;
    global.clearInterval = timers.clearInterval;
    global.clearTimeout = timers.clearTimeout;
    global.setImmediate = timers.setImmediate;
    global.setInterval = timers.setInterval;
    global.setTimeout = timers.setTimeout;
}

下面我们看下lib/timer.js中的setTimeout方法:

function setTimeout(callback, after, arg1, arg2, arg3) {
  if (typeof callback !== 'function') {
    throw new ERR_INVALID_CALLBACK();
  }

  var i, args;
  switch (arguments.length) {
    // fast cases
    case 1:
    case 2:
      break;
    case 3:
      args = [arg1];
      break;
    case 4:
      args = [arg1, arg2];
      break;
    default:
      args = [arg1, arg2, arg3];
      for (i = 5; i < arguments.length; i++) {
        // extend array dynamically, makes .apply run much faster in v6.0.0
        args[i - 2] = arguments[i];
      }
      break;
  }

  const timeout = new Timeout(callback, after, args, false, false);
  active(timeout);

  return timeout;
}

这里做了两件事:

1.实例化timeout
2.调用active(timeout)

沿着调用轨迹,首先看下Timeout类,其在internal/timers.js中被定义:

// Timer constructor function.
// The entire prototype is defined in lib/timers.js
function Timeout(callback, after, args, isRepeat, isUnrefed) {
  after *= 1; // coalesce to number or NaN
  if (!(after >= 1 && after <= TIMEOUT_MAX)) {
    if (after > TIMEOUT_MAX) {
      process.emitWarning(`${after} does not fit into` +
                          ' a 32-bit signed integer.' +
                          '\nTimeout duration was set to 1.',
                          'TimeoutOverflowWarning');
    }
    after = 1; // schedule on next tick, follows browser behavior
  }

  this._called = false;
  this._idleTimeout = after;
  this._idlePrev = this;
  this._idleNext = this;
  this._idleStart = null;
  // this must be set to null first to avoid function tracking
  // on the hidden class, revisit in V8 versions after 6.2
  this._onTimeout = null;
  this._onTimeout = callback;
  this._timerArgs = args;
  this._repeat = isRepeat ? after : null;
  this._destroyed = false;

  this[unrefedSymbol] = isUnrefed;

  initAsyncResource(this, 'Timeout');
}

这里有两个属性值得关注:

1._idleTimeout用于记录空闲时间,也就是过多久该触发
2._onTimeout记录了timer的回调

下面我们来看active方法:

// Schedule or re-schedule a timer.
// The item must have been enroll()'d first.
const active = exports.active = function(item) {
  insert(item, false);
};

// The underlying logic for scheduling or re-scheduling a timer.
//
// Appends a timer onto the end of an existing timers list, or creates a new
// TimerWrap backed list if one does not already exist for the specified timeout
// duration.
function insert(item, unrefed, start) {
  const msecs = item._idleTimeout;
  if (msecs < 0 || msecs === undefined) return;

  if (typeof start === 'number') {
    item._idleStart = start;
  } else {
    item._idleStart = TimerWrap.now();
  }

  const lists = unrefed === true ? unrefedLists : refedLists;

  // Use an existing list if there is one, otherwise we need to make a new one.
  var list = lists[msecs];
  if (list === undefined) {
    debug('no %d list was found in insert, creating a new one', msecs);
    lists[msecs] = list = new TimersList(msecs, unrefed);
  }

  if (!item[async_id_symbol] || item._destroyed) {
    item._destroyed = false;
    initAsyncResource(item, 'Timeout');
  }

  L.append(list, item);
  assert(!L.isEmpty(list)); // list is not empty
}

function TimersList(msecs, unrefed) {
  this._idleNext = this; // Create the list with the linkedlist properties to
  this._idlePrev = this; // prevent any unnecessary hidden class changes.
  this._unrefed = unrefed;
  this.msecs = msecs;

  const timer = this._timer = new TimerWrap();
  timer._list = this;

  if (unrefed === true)
    timer.unref();
  timer.start(msecs);
}

在active实际调用了insert方法,而在insert方法中,做了如下两件事:

1.如果没有lists[msecs],则实例化TimersList双向链表
2.向lists[msecs]链表中插入当前Timeout实例

下面将选择其中关键点进行讲解。

首先是msecs,这里的msecs指的是item._idleTimeout,也就是触发时间,这里也可以看出lists是一个以触发时间为key的一个map,其value就是对应的TimeList。

下面再来看TimersList,在构造函数中做了两件事:

1.实例化TimerWrap,赋值给this.timer
2.调用timer的start方法,参数是msecs

我们接着看TimerWrap,其在src/timer_wrap.cc中定义,构造函数如下所示:

static void New(const FunctionCallbackInfo<Value>& args) {
    // This constructor should not be exposed to public javascript.
    // Therefore we assert that we are not trying to call this as a
    // normal function.
    CHECK(args.IsConstructCall());
    Environment* env = Environment::GetCurrent(args);
    new TimerWrap(env, args.This());
  }

  TimerWrap(Environment* env, Local<Object> object)
      : HandleWrap(env,
                   object,
                   reinterpret_cast<uv_handle_t*>(&handle_),
                   AsyncWrap::PROVIDER_TIMERWRAP) {
    int r = uv_timer_init(env->event_loop(), &handle_);
    CHECK_EQ(r, 0);
  }

static void Start(const FunctionCallbackInfo<Value>& args) {
    TimerWrap* wrap;
    ASSIGN_OR_RETURN_UNWRAP(&wrap, args.Holder());

    CHECK(HandleWrap::IsAlive(wrap));

    int64_t timeout = args[0]->IntegerValue();
    int err = uv_timer_start(&wrap->handle_, OnTimeout, timeout, 0);
    args.GetReturnValue().Set(err);
  }

其中TimerWrap构造函数初始化过程中调用了libuv的uv_timer_init,初始化了类的公共属性handle_,Start方法调用了uv_timer_start方法,将handle_ -> timer_cb设为cb(将回调设置成了OnTimeout方法),同时将 handle->heap_node 插入到事件循环中timer最小堆中,这也就和上一篇文章讲的事件循环联系在了一起。下面我们接着看当执行到当前timer时,OnTimeout做了什么:

static void OnTimeout(uv_timer_t* handle) {
    TimerWrap* wrap = static_cast<TimerWrap*>(handle->data);
    Environment* env = wrap->env();
    HandleScope handle_scope(env->isolate());
    Context::Scope context_scope(env->context());
    Local<Value> ret;
    Local<Value> args[1];
    do {
      args[0] = env->GetNow();
      ret = wrap->MakeCallback(env->timers_callback_function(), 1, args)
                .ToLocalChecked();
    } while (ret->IsUndefined() &&
             !env->tick_info()->has_thrown() &&
             wrap->object()->Get(env->context(),
                                 env->owner_string()).ToLocalChecked()
                                                     ->IsUndefined());
  }
 

这里首先拿到handle->data,然后调用wrap->MakeCallback执行回调,这里需要注意两点:

1.handle->data是什么
2.wrap->MakeCallback是什么
3.env->timers_callback_function()又是什么

handle->data在TimerWrap的构造函数中被初始化,实际上调用时TimerWrap的父类HandleWrap的构造函数,HandleWrap构造函数如下:

HandleWrap::HandleWrap(Environment* env,
                       Local<Object> object,
                       uv_handle_t* handle,
                       AsyncWrap::ProviderType provider)
    : AsyncWrap(env, object, provider),
      state_(kInitialized),
      handle_(handle) {
  handle_->data = this;
  HandleScope scope(env->isolate());
  env->handle_wrap_queue()->PushBack(this);
}

我们可以看出this就是当前的HandleWrap实例,所以在取出后又调用了static_cast转成了其子类TimerWrap。

wrap->MakeCallback调用的其实是其父类的父类AsyncWrap的MakeCallback方法,代码如下:

MaybeLocal<Value> AsyncWrap::MakeCallback(const Local<Function> cb,
                                          int argc,
                                          Local<Value>* argv) {
  EmitTraceEventBefore();

  ProviderType provider = provider_type();
  async_context context { get_async_id(), get_trigger_async_id() };
  MaybeLocal<Value> ret = InternalMakeCallback(
      env(), object(), cb, argc, argv, context);

  // This is a static call with cached values because the `this` object may
  // no longer be alive at this point.
  EmitTraceEventAfter(provider, context.async_id);

  return ret;
}

在其中又调用了node.cc中的InternalMakeCallback方法,代码如下:

MaybeLocal<Value> InternalMakeCallback(Environment* env,
                                       Local<Object> recv,
                                       const Local<Function> callback,
                                       int argc,
                                       Local<Value> argv[],
                                       async_context asyncContext) {
  CHECK(!recv.IsEmpty());
  InternalCallbackScope scope(env, recv, asyncContext);
  if (scope.Failed()) {
    return Undefined(env->isolate());
  }

  Local<Function> domain_cb = env->domain_callback();
  MaybeLocal<Value> ret;
  if (asyncContext.async_id != 0 || domain_cb.IsEmpty() || recv.IsEmpty()) {
    ret = callback->Call(env->context(), recv, argc, argv);
  } else {
    std::vector<Local<Value>> args(1 + argc);
    args[0] = callback;
    std::copy(&argv[0], &argv[argc], args.begin() + 1);
    ret = domain_cb->Call(env->context(), recv, args.size(), &args[0]);
  }

  if (ret.IsEmpty()) {
    // NOTE: For backwards compatibility with public API we return Undefined()
    // if the top level call threw.
    scope.MarkAsFailed();
    return scope.IsInnerMakeCallback() ? ret : Undefined(env->isolate());
  }

  scope.Close();
  if (scope.Failed()) {
    return Undefined(env->isolate());
  }

  return ret;
}

InternalMakeCallback我们这里需要关注的是ret = callback->Call(env->context(), recv, argc, argv);,也就是执行了callback,也就是env->timers_callback_function()方法,那么这个方法又是什么呢?

我们回到lib/timer.js中,看到模块里有下面一段逻辑:

const [immediateInfo, toggleImmediateRef] =
  setupTimers(processImmediate, processTimers);
 
function processTimers(now) {
  if (this.owner)
    return unrefdHandle(this.owner, now);
  return listOnTimeout(this, now);
}

setupTimers定义在timer_wrap.cc中,代码如下:

static void SetupTimers(const FunctionCallbackInfo<Value>& args) {
    CHECK(args[0]->IsFunction());
    CHECK(args[1]->IsFunction());
    auto env = Environment::GetCurrent(args);

    env->set_immediate_callback_function(args[0].As<Function>());
    env->set_timers_callback_function(args[1].As<Function>());

    auto toggle_ref_cb = [] (const FunctionCallbackInfo<Value>& args) {
      Environment::GetCurrent(args)->ToggleImmediateRef(args[0]->IsTrue());
    };
    auto toggle_ref_function =
        env->NewFunctionTemplate(toggle_ref_cb)->GetFunction(env->context())
        .ToLocalChecked();
    auto result = Array::New(env->isolate(), 2);
    result->Set(env->context(), 0,
                env->immediate_info()->fields().GetJSArray()).FromJust();
    result->Set(env->context(), 1, toggle_ref_function).FromJust();
    args.GetReturnValue().Set(result);
  }

这里我们看到,其实就是把processImmediate设置成immediate_callback_function,processTimers设置成timers_callback_function。

看到这里大家就明白了,原来在timer.js中先注册了timers_callback_function,然后在timeout时调用wrap->MakeCallback(env->timers_callback_function(), 1, args).ToLocalChecked();调用了js里的回调函数,也就是执行了响应队列中的回调。

setImmediate

setImmediate入口代码如下:

function setImmediate(callback, arg1, arg2, arg3) {
  if (typeof callback !== 'function') {
    throw new ERR_INVALID_CALLBACK();
  }

  var i, args;
  switch (arguments.length) {
    // fast cases
    case 1:
      break;
    case 2:
      args = [arg1];
      break;
    case 3:
      args = [arg1, arg2];
      break;
    default:
      args = [arg1, arg2, arg3];
      for (i = 4; i < arguments.length; i++) {
        // extend array dynamically, makes .apply run much faster in v6.0.0
        args[i - 1] = arguments[i];
      }
      break;
  }

  return new Immediate(callback, args);
}

这里只实例化了一个Immediate对象,那么Immediate里又有什么呢?

const Immediate = class Immediate {
  constructor(callback, args) {
    this._idleNext = null;
    this._idlePrev = null;
    // this must be set to null first to avoid function tracking
    // on the hidden class, revisit in V8 versions after 6.2
    this._onImmediate = null;
    this._onImmediate = callback;
    this._argv = args;
    this._destroyed = false;
    this[kRefed] = false;

    initAsyncResource(this, 'Immediate');

    this.ref();
    immediateInfo[kCount]++;

    immediateQueue.append(this);
  }

这里其实就是注册了回调,然后把自己插入到链表中。

那么setImmediate什么时候执行呢?我们发现在介绍setTimeout时的这段代码:

const [immediateInfo, toggleImmediateRef] =
  setupTimers(processImmediate, processTimers);

这里利用setupTimers方法,把processImmediate这个js方法设置成immediate_callback_function,那么什么时候调用这个函数呢?

我们通过immediate_callback_function找到了调用它的地方,原来在emv.cc中:

void Environment::CheckImmediate(uv_check_t* handle) {
  Environment* env = Environment::from_immediate_check_handle(handle);

  if (env->immediate_info()->count() == 0)
    return;

  HandleScope scope(env->isolate());
  Context::Scope context_scope(env->context());

  env->RunAndClearNativeImmediates();

  do {
    MakeCallback(env->isolate(),
                 env->process_object(),
                 env->immediate_callback_function(),
                 0,
                 nullptr,
                 {0, 0}).ToLocalChecked();
  } while (env->immediate_info()->has_outstanding());

  if (env->immediate_info()->ref_count() == 0)
    env->ToggleImmediateRef(false);
}

CheckImmediate又是在Environment::Start中被注册,我们看以下代码:

void Environment::Start(int argc,
                        const char* const* argv,
                        int exec_argc,
                        const char* const* exec_argv,
                        bool start_profiler_idle_notifier) {
  HandleScope handle_scope(isolate());
  Context::Scope context_scope(context());

  // 初始化check阶段
  uv_check_init(event_loop(), immediate_check_handle());
  // loop的handle计数器减1
  uv_unref(reinterpret_cast<uv_handle_t*>(immediate_check_handle()));

  // 初始化idle阶段
  uv_idle_init(event_loop(), immediate_idle_handle());

  // 开始执行check阶段的CheckImmediate
  uv_check_start(immediate_check_handle(), CheckImmediate);
  
  ......
}

原来正在Environment::Start中把CheckImmediate加到了event loop的check阶段队列中。

这也印证了setImmediate在check阶段被执行。

process.nextTick

我们知道使用process.nextTick挂载的方法会在事件循环每一阶段结束后执行,这是如何做到的呢,我们接下来见分晓。

process.nextTick在internal/bootstrap/node.js中被初始化:

NativeModule.require('internal/process/next_tick').setup();

process.nextTick()实际调用的是internal/process/next_tick.js中的nextTick()方法:

// `nextTick()` will not enqueue any callback when the process is about to
  // exit since the callback would not have a chance to be executed.
  function nextTick(callback) {
    if (typeof callback !== 'function')
      throw new ERR_INVALID_CALLBACK();

    if (process._exiting)
      return;

    var args;
    switch (arguments.length) {
      case 1: break;
      case 2: args = [arguments[1]]; break;
      case 3: args = [arguments[1], arguments[2]]; break;
      case 4: args = [arguments[1], arguments[2], arguments[3]]; break;
      default:
        args = new Array(arguments.length - 1);
        for (var i = 1; i < arguments.length; i++)
          args[i - 1] = arguments[i];
    }

    if (queue.isEmpty())
      tickInfo[kHasScheduled] = 1;
    queue.push(new TickObject(callback, args, getDefaultTriggerAsyncId()));
 }

这里首先实例化TickObject对象,然后将其push到queue中。TickObject类定义如下:

class TickObject {
    constructor(callback, args, triggerAsyncId) {
      // this must be set to null first to avoid function tracking
      // on the hidden class, revisit in V8 versions after 6.2
      this.callback = null;
      this.callback = callback;
      this.args = args;

      const asyncId = newAsyncId();
      this[async_id_symbol] = asyncId;
      this[trigger_async_id_symbol] = triggerAsyncId;

      if (initHooksExist()) {
        emitInit(asyncId,
                 'TickObject',
                 triggerAsyncId,
                 this);
      }
    }
  }

其实就是把callback挂载属性上。

那么process.nextTick()上挂载的回调在什么时候被调用呢?

我们注意到next_tick.js中有这样一点注册代码:

 // tickInfo is used so that the C++ code in src/node.cc can
  // have easy access to our nextTick state, and avoid unnecessary
  // calls into JS land.
  // runMicrotasks is used to run V8's micro task queue.
  // set_tick_callback_function()
  const [
    tickInfo,
    runMicrotasks
  ] = process._setupNextTick(_tickCallback);

其中,process._setupNextTick方法在node.cc中定义,代码如下:

void SetupNextTick(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);

  CHECK(args[0]->IsFunction());

  env->set_tick_callback_function(args[0].As<Function>());

  env->process_object()->Delete(
      env->context(),
      FIXED_ONE_BYTE_STRING(env->isolate(), "_setupNextTick")).FromJust();

  v8::Local<v8::Function> run_microtasks_fn =
      env->NewFunctionTemplate(RunMicrotasks)->GetFunction(env->context())
          .ToLocalChecked();
  run_microtasks_fn->SetName(
      FIXED_ONE_BYTE_STRING(env->isolate(), "runMicrotasks"));

  Local<Array> ret = Array::New(env->isolate(), 2);
  ret->Set(env->context(), 0,
           env->tick_info()->fields().GetJSArray()).FromJust();
  ret->Set(env->context(), 1, run_microtasks_fn).FromJust();

  args.GetReturnValue().Set(ret);
}

这里主要做了三件事:

1.set_tick_callback_function,也就是定义tick_callback_function为我们next_tick.js中的_tickCallback方法
2.挂载RunMicrotasks,也就是v8中执行Microtasks的方法
3.挂载tick_info属性

那么tick_callback_function执行就是执行我们的_tickCallback方法,也就是执行我们的nexttick queue,下面我们就只需要关注tick_callback_function执行时机即可。

我们寻着tick_callback_function,发现他在InternalCallbackScope::Close中被调用:

void InternalCallbackScope::Close() {
  ......

  if (env_->tick_callback_function()->Call(process, 0, nullptr).IsEmpty()) {
    env_->tick_info()->set_has_thrown(true);
    failed_ = true;
  }
}

InternalCallbackScope这个我们似曾相识,在上面好像的InternalCallback中好像出现过,果然:

MaybeLocal<Value> InternalMakeCallback(Environment* env,
                                       Local<Object> recv,
                                       const Local<Function> callback,
                                       int argc,
                                       Local<Value> argv[],
                                       async_context asyncContext) {
  CHECK(!recv.IsEmpty());
  InternalCallbackScope scope(env, recv, asyncContext);
  if (scope.Failed()) {
    return Undefined(env->isolate());
  }

  ...

  scope.Close();
  if (scope.Failed()) {
    return Undefined(env->isolate());
  }

  return ret;
}

InternalCallbackScope::Close在InternalMakeCallback中被调用,我们又知道InternalMakeCallback在MakeCallback中被调用,MakeCallback代码上文提到过,这里不再赘述,而MakeCallback在我们上述的timer_wrap.cc中被调用过,这也就印证了process.nextTick挂载的方法会在事件循环每一阶段结束后执行。

我们正向整理下这里的逻辑,首先next_tick.js中的_tickCallback赋值给了tick_callback_function,在event loop每个阶段执行后,会调用相应的MakeCallback,在MakeCallback中调用了InternalMakeCallback,最终在InternalMakeCallback中调用了InternalCallbackScope::Close,从而调用了_tickCallback回调。

这里有一个地方需要注意,这里的InternalCallbackScope::Close会被执行两次,原因是InternalCallbackScope的析构函数中也调用了close方法,而InternalMakeCallback中定义了InternalCallbackScope类型的对象scope,当函数调用栈被销毁时,会调用InternalCallbackScope的析构函数,从而又调用InternalCallbackScope::Close,当然,这是我们的next_tick队列之前被清空了。

最后我们看下next_tick.js中的_tickCallback方法:

function _tickCallback() {
    let tock;
    do {
      while (tock = queue.shift()) {
        const asyncId = tock[async_id_symbol];
        emitBefore(asyncId, tock[trigger_async_id_symbol]);
        // emitDestroy() places the async_id_symbol into an asynchronous queue
        // that calls the destroy callback in the future. It's called before
        // calling tock.callback so destroy will be called even if the callback
        // throws an exception that is handled by 'uncaughtException' or a
        // domain.
        // TODO(trevnorris): This is a bit of a hack. It relies on the fact
        // that nextTick() doesn't allow the event loop to proceed, but if
        // any async hooks are enabled during the callback's execution then
        // this tock's after hook will be called, but not its destroy hook.
        if (destroyHooksExist())
          emitDestroy(asyncId);

        const callback = tock.callback;
        if (tock.args === undefined)
          callback();
        else
          Reflect.apply(callback, undefined, tock.args);

        emitAfter(asyncId);
      }
      tickInfo[kHasScheduled] = 0;
      runMicrotasks();
    } while (!queue.isEmpty() || emitPromiseRejectionWarnings());
    tickInfo[kHasPromiseRejections] = 0;
  }

这里主要做了两件事:

1.执行queue中的回调
2.runMicrotasks,执行microtasks

这里需要注意的是,在开始循环之前,process.nextTick()/microtasks 是会被先清掉的。在node官方文档中我们看到这样的定义

When Node.js starts, it initializes the event loop, processes the provided input script (or drops into the REPL, which is not covered in this document) which may make async API calls, schedule timers, or call process.nextTick(), then begins processing the event loop.

而这块逻辑的关键在./lib/internal/modules/cjs/loader.js中的runMain中,代码如下:

// bootstrap main module.
Module.runMain = function() {
  // Load the main module--the command line argument.
  Module._load(process.argv[1], null, true);
  // Handle any nextTicks added in the first tick of the program
  process._tickCallback();
};

此方法执行了用户代码,然后执行了process.nextTick 和 microtasks,这段代码在node.cc中的loadEnvironment方法被调用(ExecuteBootstrapper),也就是uv_run之前执行,这就印证了上述的在开始循环之前,process.nextTick()/microtasks 是会被先清掉的。

总结

上文中我们介绍了setTimeout、setImmediate和process.nextTick的实现原理,其中setTimeout、setImmediate是依赖于event loop中的timer和check阶段执行,而process.nextTick通过定义tick_callback_function实现了在每一个event loop阶段结束后都会执行的效果。

Node.js源码-bootstrap_node.js

上面文章提到过在src/node.cc中的LoadEnvironment方法会执行internal/bootstrap/loaders.jsinternal/bootstrap/node.js,本文就来看看这两个模块做了什么,小伙伴们注意一下,这里会包含威名远扬的vm、模块加载等方面的讲解。

GetBootstrapper

首先我们来看下如何获取到internal/bootstrap/loaders.jsinternal/bootstrap/node.js文件内容。

我们注意到上述两个文件内容的形式如下面所示:

(function(){})

那么是如何获取到其中的函数的呢?

void LoadEnvironment(Environment* env) {
	......
	Local<Function> loaders_bootstrapper =
      GetBootstrapper(env, LoadersBootstrapperSource(env), loaders_name);
   .......
}

static Local<Function> GetBootstrapper(Environment* env, Local<String> source,
	......
  // Execute the bootstrapper javascript file
  Local<Value> bootstrapper_v = ExecuteString(env, source, script_name);
  ......
 }

原来在GetBootstrapper中先去执行一下文件,得到了其中的函数。下面去执行的时候,就可以直接执行函数了。

internal/bootstrap/loaders.js

internal/bootstrap/loaders.js主要用于native模块的loader。函数输入是process、getBinding、 getLinkedBinding、getInternalBinding;输出是NativeModule构造函数和internalBinding方法。

下面我们来一步步看下代码:

1.初始化process.binding、internalBinding

// Set up process.binding() and process._linkedBinding()
  {
    const bindingObj = Object.create(null);

    process.binding = function binding(module) {
      module = String(module);
      let mod = bindingObj[module];
      if (typeof mod !== 'object') {
        mod = bindingObj[module] = getBinding(module);
        moduleLoadList.push(`Binding ${module}`);
      }
      return mod;
    };

    process._linkedBinding = function _linkedBinding(module) {
      module = String(module);
      let mod = bindingObj[module];
      if (typeof mod !== 'object')
        mod = bindingObj[module] = getLinkedBinding(module);
      return mod;
    };
}
// Set up internalBinding() in the closure
  let internalBinding;
  {
    const bindingObj = Object.create(null);
    internalBinding = function internalBinding(module) {
      let mod = bindingObj[module];
      if (typeof mod !== 'object') {
        mod = bindingObj[module] = getInternalBinding(module);
        moduleLoadList.push(`Internal Binding ${module}`);
      }
      return mod;
    };
  }

process.binding和process._linkedBinding其实调用src/node.cc中的GetBinding和GetLinkedBinding方法。internalBinding调用的是GetInternalBinding方法。

2.引入node_contextify模块

const ContextifyScript = process.binding('contextify').ContextifyScript;

contextify是node中相当重要的一个模块,主要的作用是用来执行js的代码。

ContextifyScript这个js类中,主要挂载RunInContext和RunInThisContext两个方法,后面会做详细介绍。挂载的方法用到了V8中的env->SetProtoMethod来将C++方法挂载到js类的原型上,挂载代码如下所示:

static void Init(Environment* env, Local<Object> target) {
    HandleScope scope(env->isolate());
    Local<String> class_name =
        FIXED_ONE_BYTE_STRING(env->isolate(), "ContextifyScript");

    Local<FunctionTemplate> script_tmpl = env->NewFunctionTemplate(New);
    script_tmpl->InstanceTemplate()->SetInternalFieldCount(1);
    script_tmpl->SetClassName(class_name);
    env->SetProtoMethod(script_tmpl, "runInContext", RunInContext);
    env->SetProtoMethod(script_tmpl, "runInThisContext", RunInThisContext);

    target->Set(class_name, script_tmpl->GetFunction());
    env->set_script_context_constructor_template(script_tmpl);

    Local<Symbol> parsing_context_symbol =
        Symbol::New(env->isolate(),
                    FIXED_ONE_BYTE_STRING(env->isolate(),
                                          "script parsing context"));
    env->set_vm_parsing_context_symbol(parsing_context_symbol);
    target->Set(env->context(),
                FIXED_ONE_BYTE_STRING(env->isolate(), "kParsingContext"),
                parsing_context_symbol)
        .FromJust();
  }

3.定义NativeModule构造函数

// Set up NativeModule
  // 定义NativeModule构造函数
  function NativeModule(id) {
    this.filename = `${id}.js`;
    this.id = id;
    this.exports = {};
    this.loaded = false;
    this.loading = false;
  }

  // _source是所有native模块的map
  NativeModule._source = getBinding('natives');
  NativeModule._cache = {};

  const config = getBinding('config');

NativeModule中主要有id、filename、exports对象等,这也是native模块的数据结构。

NativeModule._source存储的是所有native模块的map,key是模块名称,value是模块的ascII表示。

4.NativeModule.require

顾名思义,NativeModule.require就是用来引用Native模块的。输入是模块id,输出是模块的exports属性。代码如下:

NativeModule.require = function(id) {
    if (id === loaderId) {
      return loaderExports;
    }

    const cached = NativeModule.getCached(id);
    if (cached && (cached.loaded || cached.loading)) {
      return cached.exports;
    }

    if (!NativeModule.exists(id)) {
      // Model the error off the internal/errors.js model, but
      // do not use that module given that it could actually be
      // the one causing the error if there's a bug in Node.js
      // eslint-disable-next-line no-restricted-syntax
      const err = new Error(`No such built-in module: ${id}`);
      err.code = 'ERR_UNKNOWN_BUILTIN_MODULE';
      err.name = 'Error [ERR_UNKNOWN_BUILTIN_MODULE]';
      throw err;
    }

    moduleLoadList.push(`NativeModule ${id}`);

    const nativeModule = new NativeModule(id);

    nativeModule.cache();
    nativeModule.compile();

    return nativeModule.exports;
  };

这里做了下面几件事:

1.判断是否在缓存中,是则直接取出cached.exports。NativeModule的静态属性cache中存储缓存
2.判断是否存在该模块,不存在则抛出错误。exists方法依赖NativeModule._source。
3.创建nativeModule实例
4.缓存
5.编译nativeModule。

编译执行(nativeModule.compile)

上述步骤最为关键的是nativeModule.compile(),下面我们来介绍一下。

NativeModule.prototype.compile = function() {
    let source = NativeModule.getSource(this.id);
    source = NativeModule.wrap(source);

    this.loading = true;

    try {
      const script = new ContextifyScript(source, this.filename);
      // Arguments: timeout, displayErrors, breakOnSigint
      // 返回function (exports, require, module, process)......
      const fn = script.runInThisContext(-1, true, false);
      const requireFn = this.id.startsWith('internal/deps/') ?
        NativeModule.requireForDeps :
        NativeModule.require;
      // 执行,结果直接放在this.exports中
      fn(this.exports, requireFn, this, process);

      this.loaded = true;
    } finally {
      this.loading = false;
    }
  };

这里做了如下几件事:

1.获取源码
2.包装
3.创建ContextifyScript实例script
4.调用script.runInThisContext返回函数fn
5.执行fn,输入时nativeModule的exports属性、require方法、此nativeModule实例和process。

这里重点创建一下ContextifyScript,因为我们熟知的大名鼎鼎的vm最终也是基于此来实现的。

创建ContextifyScript实例script也就是new ContextifyScript(source, this.filename),实际会调用ContextifyScript类的静态方法new,其实就是实例化了class ContextifyScript

script.runInThisContext调用的是class ContextifyScript的静态方法RunInThisContext,代码如下:

static void RunInThisContext(const FunctionCallbackInfo<Value>& args) {
    Environment* env = Environment::GetCurrent(args);

    CHECK_EQ(args.Length(), 3);

    CHECK(args[0]->IsNumber());
    int64_t timeout = args[0]->IntegerValue(env->context()).FromJust();

    CHECK(args[1]->IsBoolean());
    bool display_errors = args[1]->IsTrue();

    CHECK(args[2]->IsBoolean());
    bool break_on_sigint = args[2]->IsTrue();

    // Do the eval within this context
    EvalMachine(env, timeout, display_errors, break_on_sigint, args);
  }

上述代码主要检查了参数,调用了静态方法EvalMachine,执行的js代码的关键也在于EvalMachine,我们下面来看一下。

static bool EvalMachine(Environment* env,
                          const int64_t timeout,
                          const bool display_errors,
                          const bool break_on_sigint,
                          const FunctionCallbackInfo<Value>& args) {
    if (!ContextifyScript::InstanceOf(env, args.Holder())) {
      env->ThrowTypeError(
          "Script methods can only be called on script instances.");
      return false;
    }
    // 获取Local<Script>实例script
    TryCatch try_catch(env->isolate());
    ContextifyScript* wrapped_script;
    ASSIGN_OR_RETURN_UNWRAP(&wrapped_script, args.Holder(), false);
    Local<UnboundScript> unbound_script =
        PersistentToLocal(env->isolate(), wrapped_script->script_);
    Local<Script> script = unbound_script->BindToCurrentContext();

    // 执行
    MaybeLocal<Value> result;
    bool timed_out = false;
    bool received_signal = false;
    if (break_on_sigint && timeout != -1) {
      Watchdog wd(env->isolate(), timeout, &timed_out);
      SigintWatchdog swd(env->isolate(), &received_signal);
      result = script->Run(env->context());
    } else if (break_on_sigint) {
      SigintWatchdog swd(env->isolate(), &received_signal);
      result = script->Run(env->context());
    } else if (timeout != -1) {
      Watchdog wd(env->isolate(), timeout, &timed_out);
      result = script->Run(env->context());
    } else {
      result = script->Run(env->context());
    }

    ......

    args.GetReturnValue().Set(result.ToLocalChecked());
    return true;
  }

这里主要做了两件事:

1.获取Local<Script>对象script
2.调用script->run执行(某些情况下需要watchdog)

watchdog用来监控执行超时,SigintWatchdog用来监听信号。我们下面来看一下watchdog如何实现?

Watchdog::Watchdog(v8::Isolate* isolate, uint64_t ms, bool* timed_out)
    : isolate_(isolate), timed_out_(timed_out) {

  int rc;
  loop_ = new uv_loop_t;
  CHECK(loop_);
  rc = uv_loop_init(loop_);
  if (rc != 0) {
    FatalError("node::Watchdog::Watchdog()",
               "Failed to initialize uv loop.");
  }

  // 线程间通信相关
  rc = uv_async_init(loop_, &async_, &Watchdog::Async);
  CHECK_EQ(0, rc);

  // 启动timer
  rc = uv_timer_init(loop_, &timer_);
  CHECK_EQ(0, rc);

  rc = uv_timer_start(&timer_, &Watchdog::Timer, ms, 0);
  CHECK_EQ(0, rc);

  rc = uv_thread_create(&thread_, &Watchdog::Run, this);
  CHECK_EQ(0, rc);
}


Watchdog::~Watchdog() {
  uv_async_send(&async_);
  uv_thread_join(&thread_);

  uv_close(reinterpret_cast<uv_handle_t*>(&async_), nullptr);

  // 清理loop
  // UV_RUN_DEFAULT so that libuv has a chance to clean up.
  uv_run(loop_, UV_RUN_DEFAULT);

  // 释放loop_指针相关资源
  int rc = uv_loop_close(loop_);
  CHECK_EQ(0, rc);
  delete loop_;
  loop_ = nullptr;
}


void Watchdog::Run(void* arg) {
  Watchdog* wd = static_cast<Watchdog*>(arg);

  // UV_RUN_DEFAULT the loop will be stopped either by the async or the
  // timer handle.
  // UV_RUN_DEFAULT: 默认的循环模式,将会不断重复这个循环,直到"循环引用计数器(ref)"减为0.
  uv_run(wd->loop_, UV_RUN_DEFAULT);

  // Loop ref count reaches zero when both handles are closed.
  // Close the timer handle on this side and let ~Watchdog() close async_
  uv_close(reinterpret_cast<uv_handle_t*>(&wd->timer_), nullptr);
}


void Watchdog::Async(uv_async_t* async) {
  Watchdog* w = ContainerOf(&Watchdog::async_, async);
  uv_stop(w->loop_);
}

node中的watchdog是利用创建一个新的线程来实现的。这里有一个需要铺垫的点是uv_run的第二个参数代表事件循环模式,UV_RUN_DEFAULT是默认的循环模式,将会不断重复这个循环,直到"循环引用计数器(ref)"减为0。

超时的流程大致如下,在新线程里面,首先执行uv_run把event loop跑起来,当timer到时后,执行uv_stop将事件循环终止,接着uv_close执行,关闭了timer handler,这时循环的ref还剩async一个,接着watchdog被析构,给给主线程发送信号,主线程接收到信号后w->isolate()->TerminateExecution(),最后清理了event loop。

internal/bootstrap/node.js

internal/bootstrap/node.jsinternal/bootstrap/loader.js的基础上,做了一系列初始化操作,最终利用CJS模块查找、执行用户的代码。下面将从逻辑流程上展开说明,后面也会重点介绍CJSModule(模块加载也在此)。

流程

1.初始化

setupProcessObject();

    // do this good and early, since it handles errors.
    setupProcessFatal();

    // 初始化map遍历器等
    setupV8();
    // 国际化
    setupProcessICUVersions();

    //为global挂载Symbol.toStringTag、buffer等属性
    setupGlobalVariables();

    const _process = NativeModule.require('internal/process');
    _process.setupConfig(NativeModule._source);
    // 信号监听相关
    _process.setupSignalHandlers();
    _process.setupUncaughtExceptionCapture(exceptionHandlerState);
    // 初始化warning等模块
    NativeModule.require('internal/process/warning').setup();
    NativeModule.require('internal/process/next_tick').setup();
    NativeModule.require('internal/process/stdio').setup();
    NativeModule.require('internal/process/methods').setup();

    const perf = process.binding('performance');
    const {
      NODE_PERFORMANCE_MILESTONE_BOOTSTRAP_COMPLETE,
      NODE_PERFORMANCE_MILESTONE_THIRD_PARTY_MAIN_START,
      NODE_PERFORMANCE_MILESTONE_THIRD_PARTY_MAIN_END,
      NODE_PERFORMANCE_MILESTONE_CLUSTER_SETUP_START,
      NODE_PERFORMANCE_MILESTONE_CLUSTER_SETUP_END,
      NODE_PERFORMANCE_MILESTONE_MODULE_LOAD_START,
      NODE_PERFORMANCE_MILESTONE_MODULE_LOAD_END,
      NODE_PERFORMANCE_MILESTONE_PRELOAD_MODULE_LOAD_START,
      NODE_PERFORMANCE_MILESTONE_PRELOAD_MODULE_LOAD_END
    } = perf.constants;

    _process.setup_hrtime();
    _process.setup_performance();
    _process.setup_cpuUsage();
    // 调用isolate->GetHeapStatistics(&v8_heap_stats);
    _process.setupMemoryUsage();
    _process.setupKillAndExit();
    if (global.__coverage__)
      NativeModule.require('internal/process/write-coverage').setup();

    NativeModule.require('internal/trace_events_async_hooks').setup();
    NativeModule.require('internal/inspector_async_hook').setup();

    _process.setupChannel();
    _process.setupRawDebug();

    const browserGlobals = !process._noBrowserGlobals;
    if (browserGlobals) {
      setupGlobalTimeouts();
      setupGlobalConsole();
      setupGlobalURL();
    }

    // Ensure setURLConstructor() is called before the native
    // URL::ToObject() method is used.
    NativeModule.require('internal/url');

    // On OpenBSD process.execPath will be relative unless we
    // get the full path before process.execPath is used.
    if (process.platform === 'openbsd') {
      const { realpathSync } = NativeModule.require('fs');
      process.execPath = realpathSync.native(process.execPath);
    }

    Object.defineProperty(process, 'argv0', {
      enumerable: true,
      configurable: false,
      value: process.argv[0]
    });
    process.argv[0] = process.execPath;

    // Handle `--debug*` deprecation and invalidation
    if (process._invalidDebug) {
      process.emitWarning(
        '`node --debug` and `node --debug-brk` are invalid. ' +
        'Please use `node --inspect` or `node --inspect-brk` instead.',
        'DeprecationWarning', 'DEP0062', startup, true);
      process.exit(9);
    } else if (process._deprecatedDebugBrk) {
      process.emitWarning(
        '`node --inspect --debug-brk` is deprecated. ' +
        'Please use `node --inspect-brk` instead.',
        'DeprecationWarning', 'DEP0062', startup, true);
    }

    if (process.binding('config').experimentalModules ||
        process.binding('config').experimentalVMModules) {
      if (process.binding('config').experimentalModules) {
        process.emitWarning(
          'The ESM module loader is experimental.',
          'ExperimentalWarning', undefined);
      }
      NativeModule.require('internal/process/esm_loader').setup();
    }

    // 废弃方法说明
    {
      // Install legacy getters on the `util` binding for typechecking.
      // TODO(addaleax): Turn into a full runtime deprecation.
      const { pendingDeprecation } = process.binding('config');
      const { deprecate } = NativeModule.require('internal/util');
      const utilBinding = process.binding('util');
      const types = internalBinding('types');
      for (const name of [
        'isArrayBuffer', 'isArrayBufferView', 'isAsyncFunction',
        'isDataView', 'isDate', 'isExternal', 'isMap', 'isMapIterator',
        'isNativeError', 'isPromise', 'isRegExp', 'isSet', 'isSetIterator',
        'isTypedArray', 'isUint8Array', 'isAnyArrayBuffer'
      ]) {
        utilBinding[name] = pendingDeprecation ?
          deprecate(types[name],
                    'Accessing native typechecking bindings of Node ' +
                    'directly is deprecated. ' +
                    `Please use \`util.types.${name}\` instead.`,
                    'DEP0103') :
          types[name];
      }
    }

这里主要做的初始化有如下几点:

1.初始化process的set_push_values_to_array_function方法
2.初始化V8,map遍历器等
3.为global挂载Symbol.toStringTag、buffer等属性
4.安装信号处理函数
5.初始化warning等模块
6.废弃方法说明

这里说明一下setupV8方法,因为它调用了咱们前面提到的v8 builtin模块。代码如下:

function setupV8() {
    // Warm up the map and set iterator preview functions.  V8 compiles
    // functions lazily (unless --nolazy is set) so we need to do this
    // before we turn off --allow_natives_syntax again.
    const v8 = NativeModule.require('internal/v8');
    // 初始化map遍历器等
    v8.previewMapIterator(new Map().entries());
    v8.previewSetIterator(new Set().entries());
    v8.previewWeakMap(new WeakMap(), 1);
    v8.previewWeakSet(new WeakSet(), 1);
    // Disable --allow_natives_syntax again unless it was explicitly
    // specified on the command line.
    // 自此Disable掉--allow_natives_syntax
    const re = /^--allow[-_]natives[-_]syntax$/;
    if (!process.execArgv.some((s) => re.test(s)))
      process.binding('v8').setFlagsFromString('--noallow_natives_syntax');
  }

其中v8.previewMapIterator就用到了v8 builtin模块:

// Clone the provided Map Iterator.
function previewMapIterator(it) {
  // v8 build-in函数,js中调用时以%开头
  // 函数一般在v8内部代码中调用,用户的js代码中调用需使用--allow-natives-syntax标记执行
  return %MapIteratorClone(it);
}

2.执行

执行阶段的代码如下:

// There is user code to be run
      // 有用户代码要执行
      // If this is a worker in cluster mode, start up the communication
      // channel. This needs to be done before any user code gets executed
      // (including preload modules).
      // 如果在集群模式下有worder,需要先初始化
      if (process.argv[1] && process.env.NODE_UNIQUE_ID) {
        perf.markMilestone(NODE_PERFORMANCE_MILESTONE_CLUSTER_SETUP_START);
        const cluster = NativeModule.require('cluster');
        // 实例化worker
        // 监听disconnect,newconn等
        cluster._setupWorker();
        perf.markMilestone(NODE_PERFORMANCE_MILESTONE_CLUSTER_SETUP_END);
        // Make sure it's not accidentally inherited by child processes.
        delete process.env.NODE_UNIQUE_ID;
      }

      if (process._eval != null && !process._forceRepl) {
        perf.markMilestone(NODE_PERFORMANCE_MILESTONE_MODULE_LOAD_START);
        perf.markMilestone(NODE_PERFORMANCE_MILESTONE_MODULE_LOAD_END);
        // User passed '-e' or '--eval' arguments to Node without '-i' or
        // '--interactive'

        perf.markMilestone(
          NODE_PERFORMANCE_MILESTONE_PRELOAD_MODULE_LOAD_START);
        preloadModules();
        perf.markMilestone(NODE_PERFORMANCE_MILESTONE_PRELOAD_MODULE_LOAD_END);

        const {
          addBuiltinLibsToObject
        } = NativeModule.require('internal/modules/cjs/helpers');
        // 为global加上'assert', 'async_hooks', 'buffer'等属性
        addBuiltinLibsToObject(global);
        evalScript('[eval]');
      } else if (process.argv[1] && process.argv[1] !== '-') {
        perf.markMilestone(NODE_PERFORMANCE_MILESTONE_MODULE_LOAD_START);
        // make process.argv[1] into a full path
        const path = NativeModule.require('path');
        process.argv[1] = path.resolve(process.argv[1]);

        const CJSModule = NativeModule.require('internal/modules/cjs/loader');

        perf.markMilestone(NODE_PERFORMANCE_MILESTONE_MODULE_LOAD_END);
        perf.markMilestone(
          NODE_PERFORMANCE_MILESTONE_PRELOAD_MODULE_LOAD_START);
        preloadModules();
        perf.markMilestone(
          NODE_PERFORMANCE_MILESTONE_PRELOAD_MODULE_LOAD_END);
        // check if user passed `-c` or `--check` arguments to Node.
        if (process._syntax_check_only != null) {
          const fs = NativeModule.require('fs');
          // read the source
          // 查找文件
          const filename = CJSModule._resolveFilename(process.argv[1]);
          const source = fs.readFileSync(filename, 'utf-8');
          // 检测语法,去掉shebang、BOM等
          checkScriptSyntax(source, filename);
          process.exit(0);
        }
        CJSModule.runMain();
      } else {
        perf.markMilestone(NODE_PERFORMANCE_MILESTONE_MODULE_LOAD_START);
        perf.markMilestone(NODE_PERFORMANCE_MILESTONE_MODULE_LOAD_END);
        perf.markMilestone(
          NODE_PERFORMANCE_MILESTONE_PRELOAD_MODULE_LOAD_START);
        preloadModules();
        perf.markMilestone(
          NODE_PERFORMANCE_MILESTONE_PRELOAD_MODULE_LOAD_END);
        // If -i or --interactive were passed, or stdin is a TTY.
        if (process._forceRepl || NativeModule.require('tty').isatty(0)) {
          // REPL
          const cliRepl = NativeModule.require('internal/repl');
          cliRepl.createInternalRepl(process.env, function(err, repl) {
            if (err) {
              throw err;
            }
            repl.on('exit', function() {
              if (repl._flushing) {
                repl.pause();
                return repl.once('flushHistory', function() {
                  process.exit();
                });
              }
              process.exit();
            });
          });

          if (process._eval != null) {
            // User passed '-e' or '--eval'
            evalScript('[eval]');
          }
        } else {
          // Read all of stdin - execute it.
          process.stdin.setEncoding('utf8');

          let code = '';
          process.stdin.on('data', function(d) {
            code += d;
          });

          process.stdin.on('end', function() {
            if (process._syntax_check_only != null) {
              checkScriptSyntax(code, '[stdin]');
            } else {
              process._eval = code;
              evalScript('[stdin]');
            }
          });
        }
      }

主要做了如下几件事:

1.如果在集群模式下有worder,需要先初始化。主要是实例化一个worker,监听disconnect,newconn等消息
2.预加载用户指定的模块(process._preload_modules)
3.利用CJSModule._resolveFilename查找文件真实路径
4.检测语法,去掉shebang、BOM等
5.执行

CJSModule

1.模块查找

模块查找主要依赖CJSModule._resolveFilename方法,输入为想要引入的模块,输出为模块的真实路径。其代码如下所示:

Module._resolveFilename = function(request, parent, isMain, options) {
  if (NativeModule.nonInternalExists(request)) {
    return request;
  }

  var paths;

  if (typeof options === 'object' && options !== null &&
      Array.isArray(options.paths)) {
    const fakeParent = new Module('', null);

    paths = [];

    for (var i = 0; i < options.paths.length; i++) {
      const path = options.paths[i];
      // parent下的node_modules
      fakeParent.paths = Module._nodeModulePaths(path);
      const lookupPaths = Module._resolveLookupPaths(request, fakeParent, true);

      if (!paths.includes(path))
        paths.push(path);

      for (var j = 0; j < lookupPaths.length; j++) {
        if (!paths.includes(lookupPaths[j]))
          paths.push(lookupPaths[j]);
      }
    }
  } else {
    paths = Module._resolveLookupPaths(request, parent, true);
  }

  // look up the filename first, since that's the cache key.
  var filename = Module._findPath(request, paths, isMain);
  if (!filename) {
    // eslint-disable-next-line no-restricted-syntax
    var err = new Error(`Cannot find module '${request}'`);
    err.code = 'MODULE_NOT_FOUND';
    throw err;
  }
  return filename;
};

上述代码首先利用Module._resolveLookupPaths罗列出所有要查找的路径,在利用Module._findPath在其中查找对应模块。

Module._resolveLookupPaths

Module._resolveLookupPaths代码如下所示:

// 'index.' character codes
// 获取所有的查找路径
// 返回一个数组,第一项为模块名称即request,第二项返回一个可能包含这个模块的文件夹路径数组
var indexChars = [ 105, 110, 100, 101, 120, 46 ];
var indexLen = indexChars.length;
Module._resolveLookupPaths = function(request, parent, newReturn) {
  // 不在lib/internal的native模块
  if (NativeModule.nonInternalExists(request)) {
    debug('looking for %j in []', request);
    return (newReturn ? null : [request, []]);
  }

  // 不以'..'、'./'开头的模块,即require('moduleA')
  // Check for relative path
  if (request.length < 2 ||
      request.charCodeAt(0) !== CHAR_DOT ||
      (request.charCodeAt(1) !== CHAR_DOT &&
       request.charCodeAt(1) !== CHAR_FORWARD_SLASH)) {
    //0:"/Users/tsy/.node_modules"
    //1:"/Users/tsy/.node_libraries"
    //2:"/Users/tsy/.nvm/versions/node/v8.2.1/lib/node"
    var paths = modulePaths;
    if (parent) {
      if (!parent.paths)
        paths = parent.paths = [];
      else
        /**
         * 0:"/Users/tsy/devspace/mis/server/third-party/node_modules"
            1:"/Users/tsy/devspace/mis/server/node_modules"
            2:"/Users/tsy/devspace/mis/node_modules"
            3:"/Users/tsy/devspace/node_modules"
            4:"/Users/tsy/node_modules"
            5:"/Users/node_modules"
            6:"/node_modules"
         */
        paths = parent.paths.concat(paths);
    }

    // Maintain backwards compat with certain broken uses of require('.')
    // by putting the module's directory in front of the lookup paths.
    // require('.')
    if (request === '.') {
      if (parent && parent.filename) {
        paths.unshift(path.dirname(parent.filename));
      } else {
        paths.unshift(path.resolve(request));
      }
    }

    debug('looking for %j in %j', request, paths);
    return (newReturn ? (paths.length > 0 ? paths : null) : [request, paths]);
  }

  // with --eval, parent.id is not set and parent.filename is null
  if (!parent || !parent.id || !parent.filename) {
    // make require('./path/to/foo') work - normally the path is taken
    // from realpath(__filename) but with eval there is no filename
    var mainPaths = ['.'].concat(Module._nodeModulePaths('.'), modulePaths);

    debug('looking for %j in %j', request, mainPaths);
    return (newReturn ? mainPaths : [request, mainPaths]);
  }
  
  ......
  
  return (newReturn ? parentDir : [id, parentDir]);
};

上面注释已经写的比较详细了,其中有几点要注意:

modulePaths根据环境变量HOME和NODE_PATH得到的路径,比如我本地得到的路径是:

0:"/Users/tsy/.node_modules"
1:"/Users/tsy/.node_libraries"
2:"/Users/tsy/.nvm/versions/node/v8.2.1/lib/node"

parent.paths是在resolveFilename调用该函数时传递下来的参数,表示从现有目录到根目录下的所有node_modules目录,获取的代码如下:

fakeParent.paths = Module._nodeModulePaths(path);

Module._nodeModulePaths = function(from) {
    // guarantee that 'from' is absolute.
    from = path.resolve(from);
    // Return early not only to avoid unnecessary work, but to *avoid* returning
    // an array of two items for a root: [ '//node_modules', '/node_modules' ]
    if (from === '/')
      return ['/node_modules'];

    // note: this approach *only* works when the path is guaranteed
    // to be absolute.  Doing a fully-edge-case-correct path.split
    // that works on both Windows and Posix is non-trivial.
    const paths = [];
    var p = 0;
    var last = from.length;
    for (var i = from.length - 1; i >= 0; --i) {
      const code = from.charCodeAt(i);
      if (code === CHAR_FORWARD_SLASH) {
        if (p !== nmLen)
          paths.push(from.slice(0, last) + '/node_modules');
        last = i;
        p = 0;
      } else if (p !== -1) {
        if (nmChars[p] === code) {
          ++p;
        } else {
          p = -1;
        }
      }
    }

    // Append /node_modules to handle root paths.
    paths.push('/node_modules');

    return paths;
  };

Module._nodeModulePaths获取从起始目录遍历,每一层都加上node_modules。

Module._findPath

Module._findPath方法实在上面列出的查找目录中找到对应的模块,代码如下:

// 文件查找
// 所以从这里可以看出,对于具体的文件的优先级:
// 1. 具体文件
// 2. 加上后缀
// 3. package.json main
// 4  index加上后缀
// 可能的路径以当前文件夹,nodejs系统文件夹和node_module中的文件夹为候选,以上述顺序找到任意一个,
// 就直接返回
var warned = false;
Module._findPath = function(request, paths, isMain) {
  if (path.isAbsolute(request)) {
    paths = [''];
  } else if (!paths || paths.length === 0) {
    return false;
  }

  var cacheKey = request + '\x00' +
                (paths.length === 1 ? paths[0] : paths.join('\x00'));
  var entry = Module._pathCache[cacheKey];
  if (entry)
    return entry;

  var exts;
  var trailingSlash = request.length > 0 &&
    request.charCodeAt(request.length - 1) === CHAR_FORWARD_SLASH;
  if (!trailingSlash) {
    trailingSlash = /(?:^|\/)\.?\.$/.test(request);
  }

  // For each path
  // 一层层遍历
  for (var i = 0; i < paths.length; i++) {
    // Don't search further if path doesn't exist
    const curPath = paths[i];
    if (curPath && stat(curPath) < 1) continue;
    var basePath = path.resolve(curPath, request);
    var filename;

    var rc = stat(basePath);
    if (!trailingSlash) {
      // 找准确的
      if (rc === 0) {  // File.
        if (preserveSymlinks && !isMain) {
          filename = path.resolve(basePath);
        } else {
          filename = toRealPath(basePath);
        }
      }

      // 拼接后缀的
      if (!filename) {
        // try it with each of the extensions
        if (exts === undefined)
          exts = Object.keys(Module._extensions);
        filename = tryExtensions(basePath, exts, isMain);
      }
    }

    if (!filename && rc === 1) {  // Directory.
      // try it with each of the extensions at "index"
      if (exts === undefined)
        exts = Object.keys(Module._extensions);
      // 找package.json中的main
      filename = tryPackage(basePath, exts, isMain);
      if (!filename) {
        filename = tryExtensions(path.resolve(basePath, 'index'), exts, isMain);
      }
    }

    if (filename) {
      // Warn once if '.' resolved outside the module dir
      if (request === '.' && i > 0) {
        if (!warned) {
          warned = true;
          process.emitWarning(
            'warning: require(\'.\') resolved outside the package ' +
            'directory. This functionality is deprecated and will be removed ' +
            'soon.',
            'DeprecationWarning', 'DEP0019');
        }
      }

      Module._pathCache[cacheKey] = filename;
      return filename;
    }
  }
  return false;
};

这里将遍历所有目录,在相应目录中再查找对应模块;在每个查找目录中,查找模块也会有一定的优先级:

1. 具体文件
2. 加上后缀
3. package.json main
4  index加上后缀

2.执行

执行的过程时调用CJSModule.runMain(),在其中调用Module._load(),Module._load代码如下:

Module._load = function(request, parent, isMain) {
  if (parent) {
    debug('Module._load REQUEST %s parent: %s', request, parent.id);
  }

  if (experimentalModules && isMain) {
    asyncESM.loaderPromise.then((loader) => {
      return loader.import(getURLFromFilePath(request).pathname);
    })
    .catch((e) => {
      decorateErrorStack(e);
      console.error(e);
      process.exit(1);
    });
    return;
  }

  // 获取文件路径
  var filename = Module._resolveFilename(request, parent, isMain);

  // 是否有缓存
  var cachedModule = Module._cache[filename];
  if (cachedModule) {
    updateChildren(parent, cachedModule, true);
    return cachedModule.exports;
  }

  if (NativeModule.nonInternalExists(filename)) {
    debug('load native module %s', request);
    return NativeModule.require(filename);
  }

  // Don't call updateChildren(), Module constructor already does.
  var module = new Module(filename, parent);

  if (isMain) {
    process.mainModule = module;
    module.id = '.';
  }

  Module._cache[filename] = module;

  tryModuleLoad(module, filename);

  return module.exports;
};

过程跟NativeModule中的_compile类似:

1.检查缓存
2.获取文件路径
3.实例化CJSModule
4.执行

执行的过程最终调用了CJSModule._compile方法,而CJSModule._compile最终调用的是vm.runInThisContext。

// create wrapper function
// wrap
var wrapper = Module.wrap(content);

var compiledWrapper = vm.runInThisContext(wrapper, {
filename: filename,
lineOffset: 0,
displayErrors: true
});

vm.runInThisContext其实就是调用了本文上面描述的ContextifyScript的runInThisContext,简要代码如下:

class Script extends ContextifyScript {
......

runInThisContext(options) {
    const { breakOnSigint, args } = getRunInContextArgs(options);
    if (breakOnSigint && process.listenerCount('SIGINT') > 0) {
      return sigintHandlersWrap(super.runInThisContext, this, args);
    } else {
      return super.runInThisContext(...args);
    }
  }

......
}

总结

本文主要从lib/internal中的loader.js和node.js入手,讲述了具体执行js代码的过程,其中还加入了相关的模块查找、vm、contextify等方面的东西。

计算机系统有关的若干问(持续更新)

  1. 大小端模式

    大小端模式是内存存储数据高低位的方式,对于位数大于8位的处理器,例如16位或者32位的处理器,由于寄存器宽度大于一个字节,那么必然存在着一个如果将多个字节安排的问题。因此就导致了大端存储模式和小端存储模式。大小端只与CPU结构有关,与CPU硬件无关,CPU只负责程序运行,只有架构才管怎样取指令以及如何存储。
    由于全局变量在编译阶段就会确定其内存分配,所以编译器还是需要知道是大端还是小端编译的,同理,操作系统也要知道大小端。

  2. 二进制机器码为什么与操作系统有关

    二进制的机器码按道理说是是CPU可以识别的指令,那么为什么同样的硬件环境,操作系统不同,二进制机器码不同呢?这是因为二进制机器码与OS的ABI相关,所以二进制机器码会随OS变化。

  3. 计算机为什么要用补码存储

    补码是在反码基础上加1得到的,反码的出现是为了解决“正负相加等于0”的问题,而补码的出现则是为了解决“只有一个0”的问题。

  4. 软硬件在虚拟内存上是如何配合的

    虚拟内存技术需要软硬件的协调,虚拟内存的这种缓存管理机制是通过操作系统内核,MMU(内存管理单元)中的地址翻译硬件和每个进程存放在主存中的页表(page table)数据结构来实现的。

    页表(page table)是存放在主存中的,每个进程维护一个单独的页表。它是一种管理虚拟内存页和物理内存页映射和缓存状态的数据结构。它逻辑上是由页表条目(Page Table Entry, PTE)为基本元素构成的数组,数组的索引号对应着虚拟页号,数组的值对应着物理页号。数组的值可以留出几位来表示有效位,权限控制位。有效位为1的时候表示虚拟页已经缓存。有效位为0,数组值为null时,表示未分配。有效位为0,数组值不为null,表示已经分配了虚拟页,但是还未缓存到具体的物理页中。权限控制位有可读,可写,是否需要root权限

  5. 为什么要有数据对齐

    许多计算机系统对基本数据类型的合法地址作出了限制,要求某种类型对象的地址必须是K的倍数。这种限制或者说是建议旨在提高内存系统的性能。比如一个处理器每次总是从内存取8个字节,如果我们保证double类型的地址都是8的倍数,那么我们每次一个内存操作就可以读或者写值了。编译器为了达到数据对齐,会在struct的存储上插入间隙或在结尾加一些填充。

  6. 栈帧中为什么要存储%rbp(基地址)

    定长的栈帧在编译阶段就可以知道栈帧的长度,但对于变长的栈帧来说(比如函数调用alloca,alloca是在栈(stack)上申请空间)就需要%rbp来记录栈帧的基地址,在函数末尾执行leave命令,将栈指针%rsp保存成%rbp的位置,将%rbp弹出到%rbp寄存器。

  7. 驱动程序原理

    驱动程序连接着操作系统和设备控制器,是操作系统内核和机器硬件之间的接口。

    用户进程利用系统调用在对设备文件进行诸如read/write操作时,系统调用通过设备文件的主设备号找到相应的设备驱动程序,然后读取这个数据结构相应的函数指针,接着把控制权交给该函数,这是linux的设备驱动程序工作的基本原理。

    驱动程序调用int register_chrdev(unsigned int major, const char * name, struct file_operations *fops) 把厂商的接口函数fops存放到chrdevs这个数组中,上面提到的主设备号吗就是数组index,当操作系统访问某个设备时,就通过这个index找到驱动程序。

  8. 调试器断点原理

    普通断点原理:直接改写断点内存地址的第一个字节,替换为int3 (0xcc,软中断机制),并保存原始字节至OD维护的一张断点表处。程序运行到此处时会中断,抛出异常,OD通过捕获该异常,暂停程序运行至断点内存地址处(断点处指令仍未执行),当执行断点处指令时,并不是完全从内存中取指令,因为该断点内存中的第一个指令已经被改写为0xcc,因此,此时执行的指令是由断点表中保存的原始字节与后续的二进制数据自合而成。执行完断点处指令后,只要断点没有被删除,其内存中的第一个字节仍然是0xcc。当删除某个断点时,od会根据删除的断点地址在断点表中查找对应的原始字节,并恢复至对应的内存中。

    编译器在生成机器码时同时会生成相应的调试信息。调试信息代表了可执行程序与源代码之间的关系,并以一种提前定义好的格式,同机器码存放在一起。

  9. TCP/IP back_log参数

    back_log代表连接队列的大小,也就是connection fd的数量,连接队列包括了半连接和全连接队列。

    半连接状态为:服务器处于Listen状态时收到客户端SYN报文时放入半连接队列中,即SYN queue(服务器端口状态为:SYN_RCVD)。

    全连接状态为:TCP的连接状态从服务器(SYN+ACK)响应客户端后,到客户端的ACK报文到达服务器之前,则一直保留在半连接状态中;当服务器接收到客户端的ACK报文后,该条目将从半连接队列搬到全连接队列尾部,即 accept queue (服务器端口状态为:ESTABLISHED)。

  10. dlopen、dlsym原理

    dlopen、dlsym是linux提供的用来加载动态链接库的系统调用。

    dlopen该函数用来打开动态库,并将其加载到进程的地址空间,完成初始化过程,并返回全局符号表(一般存储在进程的数据段)的句柄给调用进程,dlopen有两种打开方式的选择(RTLD_LAZY和RTLD_NOW),其中RTLD_LAZY表示延迟动态链接的重定位,等到使用全局方法或变量时再做重定位,而RTLD_NOW则是立刻重定位,相比于RTLD_LAZY加载速度较慢。

    dlsym则根据dlopen返回的全局符号表的位置,找到指定符号的。

  11. 信号处理的时机

    信号在程序从内核态转换到用户态时进行处理,信号集合包括block集和pending集,处于pending集中的信号,在没有处理完成时,进程收到相同信号不会排队,而会直接丢弃。

  12. 信号量的PV操作

    信号量是一种特殊的变量,实际上就是用来控制进程状态的一个代表某一资源的存储单元。表现形式是一个整型S和一个队列.

    struct semaphore {
        raw_spinlock_t                lock;
        unsigned int              count;
        struct list_head   wait_list;
    };
    
    void P(semaphore s)
    {
        s.value--; // 申请资源,边界是value已经为0了,那么现在变-1,表示有一个进程在等待
        if(s.value < 0)
        {
            将此进程加入就绪队列,等待;
            block(s.L);
        }
    }
    
    void V(semaphore s)
    {
        s.value++;
        if(s.value <= 0)
        {
            将进程P从就绪队列中移出;
            wakeup(P);// 叫醒P,让它起来干活
        }
    }
    

    其中信号量结构中list_head属性是等待信号量的线程队列,链表每一项是一个进程或线程控制快。

    block操作将当前进程的将其状态转为阻塞状态,停止运行,把该PCB插入到相应事件的等待队列中去。

    wakeup操作从队列中取出起一个PCB,将其从等待队列中移出,并置其状态为就绪状态,把该PCB插入就绪队列中,等待调度程序调度。

  13. 多进程、IO复用、多线程

    多进程、IO复用、多线程是计算机系统中的三种并发模型。

    多进程模型中,父进程接受请求,子进程负责具体工作。子进程通过父进程fork()产生,各自拥有独立的虚拟内存空间,但不利于共享数据,共享数据必须要通过IPC;同时父子进程有时还会出现竞争的关系,需要通过原子方法来避免。

    IO复用模型在一个进程中创建并发逻辑流,自己来调度这些流,常见的有poll、epoll等模型,epoll通过在内核高速缓冲区中维护一个fd的红黑树,用户态逻辑通过epoll_wait得到发生事件(就绪)的fd的数组。IO复用模型由于需要自己调度,若出现长时间阻塞CPU的操作,调度就会被阻塞,这也是node不适合CPU密集型操作的原因;同时IO复用模型也不能充分利用多核处理器。

    基于线程的并发模型混合了上述内核调度和数据共享的特点,多线程模型通过内核来调度,同时运行在单一的进程中,可共享除了栈、寄存器外的其他空间(堆、代码段、数据段等)。多线程模型由于可共享进程中的变量,随即引入了同步错误,需要通过信号量对共享变量进行保护。多线程可利用多核,但当变量保护(变量同步)很多时,就会影响性能。

  14. 线程出现的动机

    主要是考虑到上下文切换开销少和共享变量的特点,同时线程不像进程那样按照父子关系来组织,在同一进程中的线程都是对等的。

Node.js源码-global&process的创建、初始化

global和process是node中大家经常用到的两个对象,那么这两个对象又是如何被创建的,我们在我们的代码中为什么能够使用呢?本篇文章将为大家揭晓答案。

global

global对象在src/node.cc中的被创建,在bootstrap/node.js中被初始化。

创建

在src/node.cc的LoadEnvironment方法中,有以下几行代码是用来创建global对象的。

// Add a reference to the global object
  Local<Object> global = env->context()->Global();
  
  ......
  // Expose the global object as a property on itself
  // (Allows you to set stuff on `global` from anywhere in JavaScript.)
  global->Set(FIXED_ONE_BYTE_STRING(env->isolate(), "global"), global);

其中env->context()->Global()获取了当前context中的Global全局对象,global->Set(FIXED_ONE_BYTE_STRING(env->isolate(), "global"), global)将全局对象本身挂载在其global对象上,这样全局对象中有了一个global对象指向了全局对象本身,我们在该context中可以直接使用global对全局对象的引用进行访问。

以上程序之后,这样我们就可以在context的任何地方对全局对象进行操作。

初始化

初始化在bootstrap/node.js中进行,其在上述逻辑执行后被执行,所以可以直接操作全局对象。

初始化代码分为以下几个部分:

1.为global挂载process、Symbol.toStringTag、buffer等属性

//为global挂载Symbol.toStringTag、buffer等属性
setupGlobalVariables();

function setupGlobalVariables() {
    // global.toString()时访问
    Object.defineProperty(global, Symbol.toStringTag, {
      value: 'global',
      writable: false,
      enumerable: false,
      configurable: true
    });
    global.process = process;
    const util = NativeModule.require('util');

    function makeGetter(name) {
      return util.deprecate(function() {
        return this;
      }, `'${name}' is deprecated, use 'global'`, 'DEP0016');
    }

    function makeSetter(name) {
      return util.deprecate(function(value) {
        Object.defineProperty(this, name, {
          configurable: true,
          writable: true,
          enumerable: true,
          value: value
        });
      }, `'${name}' is deprecated, use 'global'`, 'DEP0016');
    }

    Object.defineProperties(global, {
      GLOBAL: {
        configurable: true,
        get: makeGetter('GLOBAL'),
        set: makeSetter('GLOBAL')
      },
      root: {
        configurable: true,
        get: makeGetter('root'),
        set: makeSetter('root')
      }
    });

    // This, as side effect, removes `setupBufferJS` from the buffer binding,
    // and exposes it on `internal/buffer`.
    NativeModule.require('internal/buffer');

    global.Buffer = NativeModule.require('buffer').Buffer;
    process.domain = null;
    process._exiting = false;
  }

这里将process挂载在global上,这也是为什么我们可以在我们的用户代码直接访问process对象的原因。同时注意Symbol.toStringTag属性会在访问global.toString()时访问。

2.初始化timeout、console、URL

const browserGlobals = !process._noBrowserGlobals;
if (browserGlobals) {
  setupGlobalTimeouts();
  setupGlobalConsole();
  setupGlobalURL();
}

function setupGlobalTimeouts() {
    const timers = NativeModule.require('timers');
    global.clearImmediate = timers.clearImmediate;
    global.clearInterval = timers.clearInterval;
    global.clearTimeout = timers.clearTimeout;
    global.setImmediate = timers.setImmediate;
    global.setInterval = timers.setInterval;
    global.setTimeout = timers.setTimeout;
  }

这里以setupGlobalTimeouts为例,主要是在global上挂载setImmediate等timeout相关的方法。

3.为global加上'assert', 'async_hooks', 'buffer'等属性

const {
  addBuiltinLibsToObject
} = NativeModule.require('internal/modules/cjs/helpers');
// 为global加上'assert', 'async_hooks', 'buffer'等属性
addBuiltinLibsToObject(global);

这里调用的是internal/modules/cjs/helpers下的addBuiltinLibsToObject方法,代码如下:

const builtinLibs = [
  'assert', 'async_hooks', 'buffer', 'child_process', 'cluster', 'crypto',
  'dgram', 'dns', 'domain', 'events', 'fs', 'http', 'http2', 'https', 'net',
  'os', 'path', 'perf_hooks', 'punycode', 'querystring', 'readline', 'repl',
  'stream', 'string_decoder', 'tls', 'trace_events', 'tty', 'url', 'util',
  'v8', 'vm', 'zlib'
];

if (typeof process.binding('inspector').open === 'function') {
  builtinLibs.push('inspector');
  builtinLibs.sort();
}

function addBuiltinLibsToObject(object) {
  // Make built-in modules available directly (loaded lazily).
  builtinLibs.forEach((name) => {
    // Goals of this mechanism are:
    // - Lazy loading of built-in modules
    // - Having all built-in modules available as non-enumerable properties
    // - Allowing the user to re-assign these variables as if there were no
    //   pre-existing globals with the same name.

    const setReal = (val) => {
      // Deleting the property before re-assigning it disables the
      // getter/setter mechanism.
      delete object[name];
      object[name] = val;
    };

    Object.defineProperty(object, name, {
      get: () => {
        const lib = require(name);

        // Disable the current getter/setter and set up a new
        // non-enumerable property.
        delete object[name];
        Object.defineProperty(object, name, {
          get: () => lib,
          set: setReal,
          configurable: true,
          enumerable: false
        });

        return lib;
      },
      set: setReal,
      configurable: true,
      enumerable: false
    });
  });
}

其中builtinLibs主要包含buffer、assert等常用的对象或方法。

process

process在src/env.cc的Environment::Start方法中被创建和初始化,同时在bootstrap/node.js也初始化了process对象。Environment::Start的调用函数是src/node.cc中的inline函数Start。

创建

在Environment::Start中初始化process的逻辑如下:

auto process_template = FunctionTemplate::New(isolate());
  process_template->SetClassName(FIXED_ONE_BYTE_STRING(isolate(), "process"));

  auto process_object =
      process_template->GetFunction()->NewInstance(context()).ToLocalChecked();
  // 初始化时声明的Persistent handle,Persistent v8::Object process_object
  // 这里利用process_object.Reset(),最终调用v8的PersistentBase<T>::Reset给Persistent handle重新赋值
  set_process_object(process_object);

这里面主要做了如下几件事:

1.声明还输模版process_template,设置其类名为process
2.获取对象实例process_object
3.调用set_process_object,将上述实例process_object赋值给初始化时声明的Persistent handle(Persistent v8::Object process_object),也就是process_object这个持久化的handle指向新创建的process_object实例。

这里需要注意的是set_process_object是从哪里来的呢?

在env-inl.h中有下面的宏定义:

#define V(PropertyName, TypeName)                                             \
  inline v8::Local<TypeName> Environment::PropertyName() const {              \
    return StrongPersistentToLocal(PropertyName ## _);                        \
  }                                                                           \
  inline void Environment::set_ ## PropertyName(v8::Local<TypeName> value) {  \
    PropertyName ## _.Reset(isolate(), value);                                \
  }
  ENVIRONMENT_STRONG_PERSISTENT_PROPERTIES(V)
#undef V

#define ENVIRONMENT_STRONG_PERSISTENT_PROPERTIES(V)                           \
                                \
  V(process_object, v8::Object)                                               \
  V(promise_reject_handled_function, v8::Function)                            \
  V(promise_reject_unhandled_function, v8::Function)                          \
  V(promise_wrap_template, v8::ObjectTemplate)                                \
  V(push_values_to_array_function, v8::Function)                              \
  V(randombytes_constructor_template, v8::ObjectTemplate)                     \
  V(script_context_constructor_template, v8::FunctionTemplate)                \
  V(script_data_constructor_function, v8::Function)                           \
  V(secure_context_constructor_template, v8::FunctionTemplate)                \
  V(shutdown_wrap_template, v8::ObjectTemplate)                               \
  V(tcp_constructor_template, v8::FunctionTemplate)                           \
  V(tick_callback_function, v8::Function)                                     \
  V(timers_callback_function, v8::Function)                                   \
  V(tls_wrap_constructor_function, v8::Function)                              \
  V(tty_constructor_template, v8::FunctionTemplate)                           \
  V(udp_constructor_function, v8::Function)                                   \
  V(vm_parsing_context_symbol, v8::Symbol)                                    \
  V(url_constructor_function, v8::Function)                                   \
  V(write_wrap_template, v8::ObjectTemplate)

从这里我们可以看出,set_process_object在这里定义,其真实调用的是process_object_.Reset方法,那么v8::Local:: Reset方法又做了什么呢?

void PersistentBase<T>::Reset(Isolate* isolate, const Local<S>& other) {
  TYPE_CHECK(T, S);
  Reset();
  if (other.IsEmpty()) return;
  this->val_ = New(isolate, other.val_);
}

这里Reset方法在PersistentBase类下,这里顾名思义PersistentBase是Persistent handle的基类,调用Reset方法也就是重新给其赋值。

那么开始的Persistent handle process_object又是怎么来的呢?我们看env.h中的如下代码:

#define V(PropertyName, TypeName)                                             \
  inline v8::Local<TypeName> PropertyName() const;                            \
  inline void set_ ## PropertyName(v8::Local<TypeName> value);
  ENVIRONMENT_STRONG_PERSISTENT_PROPERTIES(V)
#undef V

其中ENVIRONMENT_STRONG_PERSISTENT_PROPERTIES这个宏我们上面见过,就是调用了多次V,只不过这里的V代表V8的handle声明,用在process_object上其实就是如下代码:

inline v8::Local<v8::Object> process_object() const

也就是声明了Persistent handle process_object,也就有了我们后面的利用PersistentBase::Reset进行赋值的操作了。

初始化

process的初始化出现在src/env.cc和bootstrap/node.js。

在src/env.cc中,其实调用的是node.cc中的SetupProcessObject方法,初始化代码比较长,下面摘出了比较具有代表性的代码:

Local<Object> process = env->process_object();

  auto title_string = FIXED_ONE_BYTE_STRING(env->isolate(), "title");
  // 设置process的存取器
  CHECK(process->SetAccessor(env->context(),
                             title_string,
                             ProcessTitleGetter,
                             ProcessTitleSetter,
                             env->as_external()).FromJust());

  // process.version
  // 参数为 obj, name, value
  READONLY_PROPERTY(process,
                    "version",
                    FIXED_ONE_BYTE_STRING(env->isolate(), NODE_VERSION));

这里首先获取从env中获取了process_object,然后设置了一个process.title的存取器,用来获取和设置title,这里的title获取调用了uv_get_process_title,实际上就是开辟了堆内存,copy了一份process_argv[0],也就是入口main函数的argv[0];接着使用READONLY_PROPERTY这个宏设置了process的只读属性version。READONLY_PROPERTY宏定义如下所示:

#define READONLY_PROPERTY(obj, name, value)                                   \
  do {                                                                        \
    obj->DefineOwnProperty(env->context(),                                    \
                           FIXED_ONE_BYTE_STRING(isolate, name),              \
                           value, ReadOnly).FromJust();                       \
  } while (0)

在bootstrap/node.js中初始化process的代码如下:

setupProcessObject();

// do this good and early, since it handles errors.
setupProcessFatal();

// 国际化
setupProcessICUVersions();

......

Object.defineProperty(process, 'argv0', {
  enumerable: true,
  configurable: false,
  value: process.argv[0]
});
process.argv[0] = process.execPath;

......

function setupProcessObject() {
    // set_push_values_to_array_function, node.cc
    process._setupProcessObject(pushValueToArray);

    function pushValueToArray() {
      for (var i = 0; i < arguments.length; i++)
        this.push(arguments[i]);
    }
}

......

function setupProcessFatal() {
    const {
      executionAsyncId,
      clearDefaultTriggerAsyncId,
      clearAsyncIdStack,
      hasAsyncIdStack,
      afterHooksExist,
      emitAfter
    } = NativeModule.require('internal/async_hooks');

    process._fatalException = function(er) {
      
      ......
      
      return true;
    };
}

这里主要初始化了process的set_push_values_to_array_function、execPath、错误处理以及国际化版本等。

总结

本文主要讲解了global、process的创建、初始化,以及为什么我们在我们的用户代码中可以对他们进行直接的访问。

初识Electron

最近两天对electron应用一些性能分析,用到了VMtools、visual studio等工具。当然这里不是为了说这些,由于之前没有写过electron,对其实现也不了解,但在测试过程中发现了两个比较有趣的地方。

electron .electron main.js启动时,appPath竟然不同

表象如下:

一个是/Users/tsy/devspace/electron-quick-start/,另一个则是/Users/tsy/devspace/electron-quick-start/node_modules/electron/dist/Electron.app/Contents/Resources/default_app.asar

分析过程:

从后向前追溯electron源码,发现appPathApp::SetAppPath中被赋值。

在electron中有两处调用了App::SetAppPath,一个是在init.js中,一个是在default_app/mian.js中。

App::SetAppPath第一次调用是在init.js中,init.js是electron初始化的逻辑,在Node中bootstrap_node.js中被加载,init.js中获取appPath的代码如下:

我们可以看出,其在Resource下分别找appdefault_appdefault_app.asar,也就是到目前为止,appPath/Users/tsy/devspace/electron-quick-start/node_modules/electron/dist/Electron.app/Contents/Resources/default_app.asar

我们在来看App::SetAppPath的第二次调用是在default_app/mian.js中,获取path的代码如下:

这里我们看到,如果执行目录中有package.json,则将当前目录设置成appPath;如果没有,则不会改变原有appPath

结论:

我们经过上述分析可以看出

  • 当我们调用electron .时,表示当前目录,其中有package.json,所以当前目录被设置成appPath
  • 当我们调用electron main.js时,main.js/package.json不存在,则appPath保持/Users/tsy/devspace/electron-quick-start/node_modules/electron/dist/Electron.app/Contents/Resources/default_app.asar

如何加载.asar中的文件

在electron中,为了保证加密,将程序员写的应用层代码进行asar打包加密,那么如何加载.asar中的文件呢?

Electron/asar-require可以帮我们做到,加载了此模块后后面可以直接使用require去加载.asar中的文件,他是如何做到的呢?

答案也比较简单,asar-require重写了node require过程中用到的有关fs方法,特别是fs.readFileSync。

fs.readFileSync在node require中的作用是读取相应文件路径上的文件内容,下面是asar-require覆写的代码:

RPC实现原理

在前端业务越来越向后扩展的情况下,RPC的调用也变成了我们获取数据重要的一部分,所以本文主要介绍RPC及其基本原理,主要有以下三部分:

1.RPC client/server的搭建及使用

2.client/server是如何处理RPC请求和调用的,其中包括我们每次用thrift命令生成的service和types到底是干嘛的

3.RPC的原理总结

RPC client/server的搭建及使用

RPC client的搭建和使用大家可以看官网上的实现就好了

官网:https://thrift.apache.org/tutorial/nodejs

client和server的业务逻辑建议大家简单搭建一个client和server看一下,我这里已官网上的例子为例进行分析:

client:

1.createConnection

var connection = thrift.createConnection("localhost", 9090, {
  transport : transport,
  protocol : protocol
});

这里的协议主要包括JSON, XML, plain text, compact binary

2.createClient

var Calculator = require('./gen-nodejs/Calculator');
......
var client = thrift.createClient(Calculator, connection);

这里的createClient其实就是实例化了Calculator.Client对象,以下是代码:

new Calculator.Client(connection.transport, connection.protocol)

Calculator.Client构造函数如下:

var CalculatorClient = exports.Client = function(output, pClass) {
    this.output = output;
    this.pClass = pClass;
    this._seqid = 0;
    this._reqs = {};
};

大家首先关注下_seqid属性,该属性在rpc中很重要,试想一下,我们调用了某个方法,并传入callback,当结果返回时执callback(),那么程序异步获取result后,如何知道其对应callack呢,_seqid和_reqs出现了,RPC在this._reqs[seqid]中存储每个方法的callback,每个方法的seqid是在this._seqid基础上递增而来。

上面client对象中包含了output对象,对象中包含了thrift的接口,thrift协议规定了传输数据和内存中的变量之间的转换及其序列化、反序列化。这些接口可以在文档https://thrift.apache.org/docs/concepts中找到。下面只简单介绍下thrift的`writeMessageBegin`方便大家理解。

 public void writeMessageBegin(TMessage message) throws TException {  
    if (strictWrite_) {//判断是否强制写入版本号,是  
      int version = VERSION_1 | message.type;  
      writeI32(version);//写入版本号  
      writeString(message.name);//写入功能方法的名称  
      writeI32(message.seqid);//写入客户端的标识,这个标识是自动增加的  
    } else {//否  
      writeString(message.name);//写入功能方法的名称  
      writeByte(message.type);//写入类型  
      writeI32(message.seqid);//写入客户端的标识,这个标识是自动增加的  
    }  
}  

3.client[method](args, callback)

下面以caculate方法为例:

方法首先调用CalculatorClient.prototype.calculate(),其中主要代码如下:

this._reqs[this.seqid()] = callback;
this.send_calculate(logid, w);

this._reqs[this.seqid()]即咱们上面所说的存储callback的地方。

send_calculate()代码如下:

var output = new this.pClass(this.output);
output.writeMessageBegin('calculate', Thrift.MessageType.CALL, this.seqid());
var params = {
	logid: logid,
	w: w
};
var args = new Calculator_calculate_args(params);
args.write(output);
output.writeMessageEnd();
return this.output.flush();

首先调用output.writeMessageBegin()接口,表示消息的开始;
接着args.write(output),其实就是运用thrift接口,把方法所需参数传递过去,代码如下:

	output.writeStructBegin('Calculator_calculate_args');
	if (this.logid !== null && this.logid !== undefined) {
		output.writeFieldBegin('logid', Thrift.Type.I32, 1);
		output.writeI32(this.logid);
		output.writeFieldEnd();
	}
	if (this.w !== null && this.w !== undefined) {
		output.writeFieldBegin('w', Thrift.Type.STRUCT, 2);
		this.w.write(output);
		output.writeFieldEnd();
	}
	output.writeFieldStop();
	output.writeStructEnd();
	return;

最后调用flush清空缓冲区,数据发出

当结果返回后,RPC将调用CalculatorClient.prototype.recv_calculate(),代码如下:

var callback = this._reqs[rseqid] || function() {};
delete this._reqs[rseqid];
......
var result = new Calculator_calculate_result();
result.read(input);
input.readMessageEnd();

if (null !== result.ouch) {
	return callback(result.ouch);
}
if (null !== result.success) {
	return callback(null, result.success);
}
return callback('calculate failed: unknown result');

其实就是实例化一个Calculator_calculate_result对象,调用了thrift各种read接口,最后执行callback。

server:

1.createServer()

var Calculator = require("./gen-nodejs/Calculator");
.....
var server = thrift.createServer(Calculator, {
	calculate: function(logid, work, result) {
	    console.log("calculate(", logid, ",", work, ")");
	
	    var val = 0;
	    if (work.op == ttypes.Operation.ADD) {
	      val = work.num1 + work.num2;
	    } else if (work.op === ttypes.Operation.SUBTRACT) {
	      val = work.num1 - work.num2;
	    } else if (work.op === ttypes.Operation.MULTIPLY) {
	      val = work.num1 * work.num2;
	    } else if (work.op === ttypes.Operation.DIVIDE) {
	      if (work.num2 === 0) {
	        var x = new ttypes.InvalidOperation();
	        x.whatOp = work.op;
	        x.why = 'Cannot divide by 0';
	        result(x);
	        return;
	      }
	      val = work.num1 / work.num2;
	    } else {
	      var x = new ttypes.InvalidOperation();
	      x.whatOp = work.op;
	      x.why = 'Invalid operation';
	      result(x);
	      return;
	    }
	
	    var entry = new SharedStruct();
	    entry.key = logid;
	    entry.value = ""+val;
	    data[logid] = entry;
	
	    result(null, val);
	 },
 }

这里的createServer创建了一个tcp/tls的服务,监听的回调如下:

var self = this;
    stream.on('error', function(err) {
        self.emit('error', err);
    });
    stream.on('data', transport.receiver(function(transportWithData) {
      var input = new protocol(transportWithData);
      var output = new protocol(new transport(undefined, function(buf) {
        try {
            stream.write(buf);
        } catch (err) {
            self.emit('error', err);
            stream.end();
        }
      }));

      try {
        do {
          processor.process(input, output);
          transportWithData.commitPosition();
        } while (true);
      } catch (err) {
        ......
      }
    }));

    stream.on('end', function() {
      stream.end();
    });

这里的process是gen-nodejs/Calculator中export出来的processer的实例,input和output对象分别包含读和写的thrift接口。

2.processor.process()

CalculatorProcessor.prototype.process = function(input, output) {
  var r = input.readMessageBegin();
  if (this['process_' + r.fname]) {
    return this['process_' + r.fname].call(this, r.rseqid, input, output);
  } else {
    input.skip(Thrift.Type.STRUCT);
    input.readMessageEnd();
    var x = new Thrift.TApplicationException(Thrift.TApplicationExceptionType.UNKNOWN_METHOD, 'Unknown function ' + r.fname);
    output.writeMessageBegin(r.fname, Thrift.MessageType.EXCEPTION, r.rseqid);
    x.write(output);
    output.writeMessageEnd();
    output.flush();
  }
}

this['process_' + r.fname].call(this, r.rseqid, input, output),也就是说当client调用caculate方法时,会执行process_caculate方法。

process_caculate 方法中,首先实例化CalculatorProcessor.prototype.processcaculate,然后调用其read方法处理参数,接着调用我们注册在createServer()中的方法,最后调用output.writexxxoutput.flush()将结果编码并返回。

至此,我们缕顺了RPC调用和处理的流程,下面我们总结下我们上面一直Calculator.js中到底有什么?

1.服务中每个方法的参数对象(包含read、write方法)

2.服务中每个方法的调用返回结果对象(包含read、write方法)

3.client,其原型中包含所有同service中声明的方法名相同的方法(比如xxxmethod)、send_xxxmethod、recv_xxxmethod

4.processor,其原型中包含与service所有声明的方法一一对应的process方法(process_xxxmethod)

RPC调用原理

1.server端启动程序,侦听端口,实现提供给client调用的函数,保存在一个对象里。

2.client端启动程序,连接服务端,连接完成后发送describe命令,要求server返回它能提供调用的函数名。

3.server端接收到describe命令,把自己可供调用的函数名包装好发送出去

4.client端接收到server发送的函数名,注册到自己的对象里,给每个函数名包装一个方法,使本地调用这些函数时实际上是向server端发送请求:

5.client端调用server端的函数:

1) 给传入的callback函数生成一个唯一ID,称为callbackId,记录到client的一个对象里。

2) 包装好以下数据发送给server端:调用函数名,JSON序列化后的参数列表,callbackId

6.server端接收到上述信息,解析数据,对参数列表反序列化,根据函数名和参数调用函数。

7.函数运行完成后,把结果序列化,连同之前收到的callbackId发送回client端

8.client端接收到函数运行结果和callbackId,根据callbackId取出回调函数,把运行结果传入回调函数中执行。

刨根问底之node-gyp

在我们写node addon时,需要使用node-gyp命令行工具,大部分同学会用configue生成配置文件,然后使用build进行构建。但是node-gyp到底是什么?底层有什么呢?下面我们来刨根问底。

本文的线索是自底向上的讲解node-gyp的各层次依赖,主要有以下几个部分:

1. make
2. make install
3. cmake
4. gyp
5. node-gyp

层次结构如下图所示:

make

从源文件到可执行文件叫做编译(包括预编译、编译、链接),而make作为构建工具掌握着编译的过程,也就是如何去编译、文件编译的顺序等。

make是最常用的构建工具,针对用户制定的构建规则(makefile)去执行响应的任务。make会根据构建规则去查找依赖,决定编译顺序等。大致了解可参考Make 命令教程

Makefile(makefile)中定义了make的构建规则,当然也可以自己指定规则文件。例如:

$ make -f rules.txt
# 或者
$ make --file=rules.txt

Makefile由一条条的规则组成,每条规则由target(目标)、source(前置条件/依赖)、command(指令)三者组成。

形式如下:

<target> : <prerequisites> 
[tab]  <commands>

make target时,主要做了以下几件事:

1.检查目标是否存在
2.如果不存在目标
	· 检查目标的依赖是否存在
	· 不存在则调用`make source`;存在并且没有变化(修改时间戳小于target),不操作
	· 执行target中的command指令
2.如果存在目标
	· 检查依赖是否发生变化
	· 没有变化则不需要执行,有变化则执行`make source`后执行command

以编译一个C++文件的规则为例:

hellomake: hellomake.c hellofunc.c
     gcc -o hellomake hellomake.c hellofunc.c -I.

当我们执行make hellomake,会使用gcc编译器编译产出hellomake。如果make不带有参数,则执行makefile中的第一条指令。

make也允许我们定义一些纯指令(伪指令)去执行一些操作,相当于把上面的target写成指令名称,只不过在command中不生成文件,所以每次执行该规则时都会执行command。为了和真实的目标文件做区分,make中使用了.PHONY关键字,关键字.PHONY可以解决这问题,告诉make该目标是“假的”(磁盘上其实没有这个目标文件)。例如

.PHONY: clean
clean:
        rm *.o temp

由于makefile目标只能写一个,所以我们可以使用all来将多个目标组合起来。例如:

all: executable1 executable2

一般情况下可以把all放在makefile的第一行,这样不带参数执行make就会找到all。

make install

make install用来安装文件,它从Makefile中读取指令,安装到系统目录中。

cmake

上面提到了make,似乎已经够了,如果我是一个开发者,我定义了makefile,让使用者执行make编译就好了。但是不同平台的编译器、动态链接库的路径都有可能不同,如果想让你的软件能够跨平台编译、运行,必须要保证能够在不同平台编译。如果使用上面的Make工具,就得为每一种标准写一次Makefile,这是很繁琐并且容易出错的地方。

cmake的出现就是为了解决上述问题,它首先允许开发者编写一种平台无关的 CMakeList.txt文件来定制整个编译流程,cmake会根据操作系统选择不同编译器,当然也可以在CMakeList.txt中去指定,执行cmake时会目标用户的平台和自定义的配置生成所需的Makefile或工程文件,如Unix的Makefile、Windows的Visual Studio。

CMake是一个跨平台的安装(编译)工具,可以用简单的语句来描述所有平台的安装(编译过程)。他能够输出各种各样的makefile或者project文件,能测试编译器所支持的C++特性,类似UNIX下的automake。

在 linux 平台下使用 CMake 生成 Makefile 并编译的流程如下:

1.编写 CMake 配置文件 CMakeLists.txt 。
2.执行命令 cmake PATH 或者 ccmake PATH 生成 Makefile。其中,PATH是CMakeLists.txt 所在的目录。
3.使用 make 命令进行编译。

CMakeList.txt中由面向过程的一条条指令组成,例如:

# CMake 最低版本号要求
cmake_minimum_required (VERSION 2.8)
# 项目信息
project (Demo3)
# 查找当前目录下的所有源文件
# 并将名称保存到 DIR_SRCS 变量
aux_source_directory(. DIR_SRCS)
# 添加 math 子目录
add_subdirectory(math)
# 指定生成目标 
add_executable(Demo main.cc)
# 添加链接库
target_link_libraries(Demo MathFunctions)

具体可参考cmake文档

GYP

Gyp是一个类似CMake的项目生成工具, 用于管理你的源代码, 在google code主页上唯一的一句slogan是”GYP can Generate Your Projects.”。GYP是由 Chromium 团队开发的跨平台自动化项目构建工具,Chromium便是通过GYP进行项目构建管理。

首先看GYP与cmake类似,那为什要有GYP呢?GYP和cmake有哪些相同点、不同点呢?

GYP vs cmake

相同点:

支持跨平台项目工程文件输出,Windows 平台默认是 Visual Studio,Linux 平台默认是 Makefile,Mac 平台默认是 Xcode,这个功能 CMake 也同样支持,只是缺少了 Xcode。

不同点:

配置文件形式不同,GYP的配置文件更像一个“配置文件”,而Cmake的上述所言更像一个面向过程的一个脚本,也就是说在项目设置的层次上进行抽象;同时GYP支持交叉编译。

具体比较可参考GYP vs. CMake

GYP配置

GYP的配置文件以.gyp结尾,一个典型的.gyp文件如下所示:

{
    'variables': {
      .
      .
      .
    },
    'includes': [
      '../build/common.gypi',
    ],
    'target_defaults': {
      .
      .
      .
    },
    'targets': [
      {
        'target_name': 'target_1',
          .
          .
          .
      },
      {
        'target_name': 'target_2',
          .
          .
          .
      },
    ],
    'conditions': [
      ['OS=="linux"', {
        'targets': [
          {
            'target_name': 'linux_target_3',
              .
              .
              .
          },
        ],
      }],
      ['OS=="win"', {
        'targets': [
          {
            'target_name': 'windows_target_4',
              .
              .
              .
          },
        ],
      }, { # OS != "win"
        'targets': [
          {
            'target_name': 'non_windows_target_5',
              .
              .
              .
          },
      }],
    ],
  }

variables : 定义可以在文件其他地方访问的变量;

includes : 将要被引入到该文件中的文件列表,通常是以.gypi结尾的文件

target_defaults : 将作用域所有目标的默认配置;

targets: 构建的目标列表,每个target中包含构建此目标的所有配置;

conditions: 条件列表,会根据不同条件选择不同的配置项。在最顶级的配置中,通常是平台特定的目标配置。

具体可参考GYP文档

node-gyp

node-gyp是一个跨平台的命令行工具,目的是编译node addon模块。

常用的命令有configurebuildconfigure 原理就是利用gyp生成不同的编译配置文件,build则根据不同平台、不同构建配置进行编译。

configure

我们分步骤看下configure的代码:

findPython(python, function (err, found) {
    if (err) {
      callback(err)
    } else {
      python = found
      getNodeDir()
    }
})

由于GYP是python写的,所以这里首先找当前系统下的python,内部利用的是which这个第三方库。

function getNodeDir () {

    // 'python' should be set by now
    process.env.PYTHON = python

    if (gyp.opts.nodedir) {
      // --nodedir was specified. use that for the dev files
      nodeDir = gyp.opts.nodedir.replace(/^~/, osenv.home())

      log.verbose('get node dir', 'compiling against specified --nodedir dev files: %s', nodeDir)
      createBuildDir()

    } else {
      gyp.commands.install([ release.version ], function (err, version) {
        if (err) return callback(err)
        log.verbose('get node dir', 'target node version installed:', release.versionDir)
        nodeDir = path.resolve(gyp.devDir, release.versionDir)
        createBuildDir()
      })
    }
  }

找到node所在目录,如果没有,则下载node压缩包并解压。

function createBuildDir () {
    log.verbose('build dir', 'attempting to create "build" dir: %s', buildDir)
    mkdirp(buildDir, function (err, isNew) {
      if (err) return callback(err)
      log.verbose('build dir', '"build" dir needed to be created?', isNew)
      if (win && (!gyp.opts.msvs_version || gyp.opts.msvs_version === '2017')) {
        findVS2017(function (err, vsSetup) {
          if (err) {
            log.verbose('Not using VS2017:', err.message)
            createConfigFile()
          } else {
            createConfigFile(null, vsSetup)
          }
        })
      } else {
        createConfigFile()
      }
    })
  }

创建build目录,这里区分了是否有vs,查找vs的方法是打开powershell(windows),试图打开vs。

function createConfigFile (err, vsSetup) {
    if (err) return callback(err)

    var configFilename = 'config.gypi'
    var configPath = path.resolve(buildDir, configFilename)

    if (vsSetup) {
      // GYP doesn't (yet) have support for VS2017, so we force it to VS2015
      // to avoid pulling a floating patch that has not landed upstream.
      // Ref: https://chromium-review.googlesource.com/#/c/433540/
      gyp.opts.msvs_version = '2015'
      process.env['GYP_MSVS_VERSION'] = 2015
      process.env['GYP_MSVS_OVERRIDE_PATH'] = vsSetup.path
      defaults['msbuild_toolset'] = 'v141'
      defaults['msvs_windows_target_platform_version'] = vsSetup.sdk
      variables['msbuild_path'] = path.join(vsSetup.path, 'MSBuild', '15.0',
                                            'Bin', 'MSBuild.exe')
    }

    // loop through the rest of the opts and add the unknown ones as variables.
    // this allows for module-specific configure flags like:
    //
    //   $ node-gyp configure --shared-libxml2
    Object.keys(gyp.opts).forEach(function (opt) {
      if (opt === 'argv') return
      if (opt in gyp.configDefs) return
      variables[opt.replace(/-/g, '_')] = gyp.opts[opt]
    })

    configs.push(configPath)
    fs.writeFile(configPath, [prefix, json, ''].join('\n'), findConfigs)
}

这里创建config.gypi文件,主要包含target_defaultsvariables

// config = ['config.gypi']
  function runGyp (err) {
    if (err) return callback(err)

    if (!~argv.indexOf('-f') && !~argv.indexOf('--format')) {
      if (win) {
        log.verbose('gyp', 'gyp format was not specified; forcing "msvs"')
        // force the 'make' target for non-Windows
        argv.push('-f', 'msvs')
      } else {
        log.verbose('gyp', 'gyp format was not specified; forcing "make"')
        // force the 'make' target for non-Windows
        argv.push('-f', 'make')
      }
    }

    if (win && !hasMsvsVersion()) {
      if ('msvs_version' in gyp.opts) {
        argv.push('-G', 'msvs_version=' + gyp.opts.msvs_version)
      } else {
        argv.push('-G', 'msvs_version=auto')
      }
    }

    // include all the ".gypi" files that were found
    configs.forEach(function (config) {
      argv.push('-I', config)
    })

    // For AIX and z/OS we need to set up the path to the exports file
    // which contains the symbols needed for linking. 
    var node_exp_file = undefined
    if (process.platform === 'aix' || process.platform === 'os390') {
      var ext = process.platform === 'aix' ? 'exp' : 'x'
      var node_root_dir = findNodeDirectory()
      var candidates = undefined 
      if (process.platform === 'aix') {
        candidates = ['include/node/node',
                      'out/Release/node',
                      'out/Debug/node',
                      'node'
                     ].map(function(file) {
                       return file + '.' + ext
                     })
      } else {
        candidates = ['out/Release/obj.target/libnode',
                      'out/Debug/obj.target/libnode',
                      'lib/libnode'
                     ].map(function(file) {
                       return file + '.' + ext
                     })
      }
      var logprefix = 'find exports file'
      node_exp_file = findAccessibleSync(logprefix, node_root_dir, candidates)
      if (node_exp_file !== undefined) {
        log.verbose(logprefix, 'Found exports file: %s', node_exp_file)
      } else {
        var msg = msgFormat('Could not find node.%s file in %s', ext, node_root_dir)
        log.error(logprefix, 'Could not find exports file')
        return callback(new Error(msg))
      }
    }

    // this logic ported from the old `gyp_addon` python file
    var gyp_script = path.resolve(__dirname, '..', 'gyp', 'gyp_main.py')
    var addon_gypi = path.resolve(__dirname, '..', 'addon.gypi')
    var common_gypi = path.resolve(nodeDir, 'include/node/common.gypi')
    fs.stat(common_gypi, function (err, stat) {
      if (err)
        common_gypi = path.resolve(nodeDir, 'common.gypi')

      var output_dir = 'build'
      if (win) {
        // Windows expects an absolute path
        output_dir = buildDir
      }
      var nodeGypDir = path.resolve(__dirname, '..')
      var nodeLibFile = path.join(nodeDir,
        !gyp.opts.nodedir ? '<(target_arch)' : '$(Configuration)',
        release.name + '.lib')

      argv.push('-I', addon_gypi)
      argv.push('-I', common_gypi)
      argv.push('-Dlibrary=shared_library')
      argv.push('-Dvisibility=default')
      argv.push('-Dnode_root_dir=' + nodeDir)
      if (process.platform === 'aix' || process.platform === 'os390') {
        argv.push('-Dnode_exp_file=' + node_exp_file)
      }
      argv.push('-Dnode_gyp_dir=' + nodeGypDir)
      argv.push('-Dnode_lib_file=' + nodeLibFile)
      argv.push('-Dmodule_root_dir=' + process.cwd())
      argv.push('-Dnode_engine=' +
        (gyp.opts.node_engine || process.jsEngine || 'v8'))
      argv.push('--depth=.')
      argv.push('--no-parallel')

      // tell gyp to write the Makefile/Solution files into output_dir
      argv.push('--generator-output', output_dir)

      // tell make to write its output into the same dir
      argv.push('-Goutput_dir=.')

      // enforce use of the "binding.gyp" file
      argv.unshift('binding.gyp')

      // execute `gyp` from the current target nodedir
      argv.unshift(gyp_script)

      // make sure python uses files that came with this particular node package
      var pypath = [path.join(__dirname, '..', 'gyp', 'pylib')]
      if (process.env.PYTHONPATH) {
        pypath.push(process.env.PYTHONPATH)
      }
      process.env.PYTHONPATH = pypath.join(win ? ';' : ':')

      var cp = gyp.spawn(python, argv)
      cp.on('exit', onCpExit)
    })
}

这里主要是区分了不同平台,给GYP命令加入各种参数,其中-I代表include,最后执行gyp脚本生成构建配置文件,比如unix下生成makefile。

build

build比较简单,言简意赅就是就是区分不同平台,收集不同参数,利用不同编译工具进行编译。

command = win ? 'msbuild' : makeCommand

区分编译工具。

function loadConfigGypi () {
    fs.readFile(configPath, 'utf8', function (err, data) {
      if (err) {
        if (err.code == 'ENOENT') {
          callback(new Error('You must run `node-gyp configure` first!'))
        } else {
          callback(err)
        }
        return
      }
      config = JSON.parse(data.replace(/\#.+\n/, ''))

      // get the 'arch', 'buildType', and 'nodeDir' vars from the config
      buildType = config.target_defaults.default_configuration
      arch = config.variables.target_arch
      nodeDir = config.variables.nodedir

      if ('debug' in gyp.opts) {
        buildType = gyp.opts.debug ? 'Debug' : 'Release'
      }
      if (!buildType) {
        buildType = 'Release'
      }

      log.verbose('build type', buildType)
      log.verbose('architecture', arch)
      log.verbose('node dev dir', nodeDir)

      if (win) {
        findSolutionFile()
      } else {
        doWhich()
      }
    })
}

加载config.gypi,为构建收集一波参数。如果在windows下,收集build/*.sln


  function doBuild () {

    // Enable Verbose build
    var verbose = log.levels[log.level] <= log.levels.verbose
    if (!win && verbose) {
      argv.push('V=1')
    }
    if (win && !verbose) {
      argv.push('/clp:Verbosity=minimal')
    }

    if (win) {
      // Turn off the Microsoft logo on Windows
      argv.push('/nologo')
    }

    // Specify the build type, Release by default
    if (win) {
      var archLower = arch.toLowerCase()
      var p = archLower === 'x64' ? 'x64' :
              (archLower === 'arm' ? 'ARM' : 'Win32')
      argv.push('/p:Configuration=' + buildType + ';Platform=' + p)
      if (jobs) {
        var j = parseInt(jobs, 10)
        if (!isNaN(j) && j > 0) {
          argv.push('/m:' + j)
        } else if (jobs.toUpperCase() === 'MAX') {
          argv.push('/m:' + require('os').cpus().length)
        }
      }
    } else {
      argv.push('BUILDTYPE=' + buildType)
      // Invoke the Makefile in the 'build' dir.
      argv.push('-C')
      argv.push('build')
      if (jobs) {
        var j = parseInt(jobs, 10)
        if (!isNaN(j) && j > 0) {
          argv.push('--jobs')
          argv.push(j)
        } else if (jobs.toUpperCase() === 'MAX') {
          argv.push('--jobs')
          argv.push(require('os').cpus().length)
        }
      }
    }

    if (win) {
      // did the user specify their own .sln file?
      var hasSln = argv.some(function (arg) {
        return path.extname(arg) == '.sln'
      })
      if (!hasSln) {
        argv.unshift(gyp.opts.solution || guessedSolution)
      }
    }

    var proc = gyp.spawn(command, argv)
    proc.on('exit', onExit)
}

执行编译命令。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.