diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000..da9ee58 Binary files /dev/null and b/.DS_Store differ diff --git a/scheduler/.DS_Store b/scheduler/.DS_Store new file mode 100644 index 0000000..499486d Binary files /dev/null and b/scheduler/.DS_Store differ diff --git a/scheduler/P1-调度器入口篇.md b/scheduler/P1-调度器入口篇.md new file mode 100644 index 0000000..7fcad88 --- /dev/null +++ b/scheduler/P1-调度器入口篇.md @@ -0,0 +1,189 @@ +# 调度器启动 + +## 前言 +本篇介绍scheduler的初始化相关逻辑 + +## 入口之前 +入口函数是位于`cmd/kube-scheduler/scheduler.go`中的main()方法,调用的是app.NewSchedulerCommand()方法,跳转至此方法,可以看到函数上方的注释: + +``` +// NewSchedulerCommand creates a *cobra.Command object with default parameters +func NewSchedulerCommand() *cobra.Command { + ... +} +``` +NewSchedulerCommand创建的是一个cobra.Command对象,后续的命令行处理相关功能都是借助cobra来实现的,那么继续往下之前,为了避免从入口开始就一脸懵,有必要了解一下cobra这个工具 + +## cobra +#### 什么是cobra? +github主页: https://github.com/spf13/cobra +主页的介绍是: Cobra是一个强大的用于创建现代化CLI命令行程序的库,用于生成应用程序和命令文件。众多高知名度的项目采用了它,例如我们熟悉的kubernetes和docker +cobra创建的程序CLI遵循的模式是: `APPNAME COMMAND ARG --FLAG`,与常见的其他命令行程序一样,例如git: `git clone URL --bare` + +#### 安装: + +``` +#最简单的安装方式,但毫无意外,事情并没有那么简单,我们的网络的问题,导致无法正常安装依赖, +go get -u github.com/spf13/cobra/cobra + +#怎么办呢?先进入GOPATH中,手动安装报错缺失的两个依赖: +cd /Users/ywq/go/ +mkdir -p src/golang.org/x +cd golang.org/x +git clone https://github.com/golang/text.git +git clone https://github.com/golang/sys.git + +#然后执行: +go install github.com/spf13/cobra/cobra +matebook-x-pro:x ywq$ ls /Users/ywq/go/bin/cobra +/Users/ywq/go/bin/cobra +#安装完毕,记得把GOBIN加入PATH环境变量哦,否则无法直接运行cobra命令 +``` + +#### 简单试用cobra: +``` +matebook-x-pro:local ywq$ cd /Users/ywq/go/src/local/ +matebook-x-pro:local ywq$ cobra init testapp --pkg-name=local/testapp +matebook-x-pro:local ywq$ ls +testapp +matebook-x-pro:local ywq$ ls testapp/ +LICENSE cmd/ main.go +matebook-x-pro:local ywq$ ls testapp/cmd/ +root.go +matebook-x-pro:local ywq$ cd testapp +matebook-x-pro:local ywq$ go run main.go +# 报错:subcommand is required,要求提供子命令 +# 因需要多次测试,这里所有的测试步骤就把build的步骤跳过,直接使用go run main.go进行测试 +``` +**我们打开IDE来查看一下testapp的代码结构:** +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/cobra1.jpg) +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/cobra2.jpg) + +``` +# 现在还未创建子命令,那么来创建几个试试: +matebook-x-pro:testapp ywq$ cobra add get +get created at /Users/ywq/go/src/local/testapp +matebook-x-pro:testapp ywq$ cobra add delete +delete created at /Users/ywq/go/src/local/testapp +matebook-x-pro:testapp ywq$ cobra add add +add created at /Users/ywq/go/src/local/testapp +matebook-x-pro:testapp ywq$ cobra add update +matebook-x-pro:testapp ywq$ ls cmd/ +add.go delete.go get.go root.go update.go + +# 查看help,可以发现刚添加的子命令已经加入提示并可用了 +matebook-x-pro:testapp ywq$ go run main.go -h +... + +Available Commands: + add A brief description of your command + delete A brief description of your command + get A brief description of your command + help Help about any command + update A brief description of your command + +# 调用子命令试试: +matebook-x-pro:testapp ywq$ go run main.go get +get called +matebook-x-pro:testapp ywq$ go run main.go add +add called +``` + +**来看看新增的子命令是怎么运行的呢?** +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/cobra3.jpg) +截图圈中部分可以看出,子命令是在init()函数里为root级添加了一个子命令,先不去管底层实现,接着往下. + +**测试cobra的强大简洁的flag处理** +我们在`cmd/delete.go`的init()函数中,定义一个flag处理配置: +``` +var obj string +deleteCmd.PersistentFlags().StringVar(&obj,"object", "", "A function to delete an test object") +``` +在`Run:func()`匿名函数中添加一行输出: +`fmt.Println("delete obj:",cmd.Flag("object").Value)` +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/cobra4.jpg) + +运行结果: + +``` +matebook-x-pro:testapp ywq$ go run main.go delete --object obj1 +delete called +delete obj: obj1 + +``` +如果觉得`--`flag符号太麻烦,cobra同样支持短符号`-`flag缩写: +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/cobra5.jpg) + +运行结果: + +``` +matebook-x-pro:testapp ywq$ go run main.go delete -o obj1 +delete called +delete obj: obj1 + +``` + +这里只是两级命令加flag,但我们常见的,例如(kubectl delete pod xxx),是有3级命令 + args的,怎么再多添加一级子命令呢?cobra帮你一条命令实现 + +``` +matebook-x-pro:testapp ywq$ cobra add pods -p deleteCmd # -p为父级命令,默认其名称格式为(parentCommandName)Cmd +matebook-x-pro:testapp ywq$ ls cmd +add.go delete.go get.go pods.go root.go update.go + +``` +可以发现,cmd/目录下多了一个pods.go文件,我们来看看它是怎么关联上delete父级命令的,同时为它添加一行输出: +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/cobra6.jpg) +执行命令: + +``` +matebook-x-pro:testapp ywq$ go run main.go delete pods pod1 +pods called +delete pods: pod1 + +``` + +#### 看到这里,相信对cobra的强大简洁已经有了初步的认知,建议自行进入项目主页了解详情并进行安装测试 + +## 入口 +通过对上方cobra的基本了解,我们不难知道,`cmd/kube-scheduler/scheduler.go`内的main()方法内部实际调用的是`cobra.Command.Run`内的匿名函数,我们可以进入`NewSchedulerCommand()`内部确认: +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/main1.jpg) + +可以看到,调用了`Run`内部`runCommand`方法,再来看看Run方法内部需要重点关注的几个点: +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/runCommand.jpg) + +其中,上方是对命令行的参数、选项校验的步骤,跳过,重点关注两个变量:`cc和stopCh`,这两个变量会作为最后调用`Run()`方法的参数,其中`stopCh`作用是作为主程序退出的信号通知其他各协程进行相关的退出操作的,另外一个cc变量非常重要,可以点击`c.Complete()`方法,查看该方法的详情: +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/runCommand.jpg) +`Complete()`方法本质上返回的是一个Config结构体,该结构体内部的元素非常丰富,篇幅有限就不一一点开截图了,大家可以自行深入查看这些元素的作用,这里简单概括一下其中几个: + +``` +// scheduler 本身相关的配置都集中于此,例如名称、调度算法、pod亲和性权重、leader选举机制、metric绑定地址,健康检查绑定地址,绑定超时时间等等 +ComponentConfig kubeschedulerconfig.KubeSchedulerConfiguration + +// 这几个元素都是与apiserver认证授权相关的 +InsecureServing *apiserver.DeprecatedInsecureServingInfo // nil will disable serving on an insecure port +InsecureMetricsServing *apiserver.DeprecatedInsecureServingInfo // non-nil if metrics should be served independently +Authentication apiserver.AuthenticationInfo +Authorization apiserver.AuthorizationInfo +SecureServing *apiserver.SecureServingInfo + +// Clientset.Interface内部封装了向apiServer所支持的所有apiVersion(apps/v1beta2,extensions/v1beta1...)之下的resource(pod/deployment/service...)发起查询请求的功能 +Client clientset.Interface + +// 这几个元素都是与Event资源相关的,实现rest api处理以及记录、通知等功能 +EventClient v1core.EventsGetter +Recorder record.EventRecorder +Broadcaster record.EventBroadcaster +``` +这里层级非常深,不便展示,Config这一个结构体非常重要,可以认真读一读代码。回到`cmd/kube-scheduler/app/server.go`.`runCommand`这里来,接着往下,进入其最后return调用的`Run()`函数中,函数中的前部分都是启动scheduler相关的组件,如event broadcaster、informers、healthz server、metric server等,重点看图中红框圈出的`sched.Run()`,这才是scheduler主程序的调用运行函数: +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/Run.jpg) + +进入`sched.Run()`: +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/scheRun.jpg) + +`wait.Until`这个调用的意思是,直到收到stop信号,在此之前循环运行`sched.scheduleOne`,终于找到启动函数最内部的主体啦: +![image](https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/scheduler/image/p1/scheduleOne.jpg) + +`sched.scheduleOne`这个函数有代码点长,整体的功能可以概括为:获取需调度的pod、寻找匹配host、发起绑定host请求、绑定检查等一系列操作. + +#### 本篇入口篇到这里就先告一段落,下一篇开始阅读学习调度过程的逻辑! + diff --git a/scheduler/README.md b/scheduler/README.md new file mode 100644 index 0000000..9f7e2e9 --- /dev/null +++ b/scheduler/README.md @@ -0,0 +1,78 @@ +# 调度器设计 +首先列出官方md链接,讲解颇为生动: +https://github.com/kubernetes/community/blob/master/contributors/devel/sig-scheduling/scheduler.md +这里用结合自己阅读代码的理解做一下翻译。 + +### 工作模式 +Kubernetes scheduler独立运作与其他主要组件之外(例如API Server),它连接API Server,watch观察,如果有PodSpec.NodeName为空的Pod出现,则开始工作,通过一定得筛选算法,筛选出合适的Node之后,向API Server发起一个绑定指示,申请将Pod与筛选出的Node进行绑定。 + +### 代码层级 +回归到代码本身,scheduler的设计分为3个主要代码层级: +- `cmd/kube-scheduler/scheduler.go`: 这里的main()函数即是scheduler的入口,它会读取指定的命令行参数,初始化调度器框架,开始工作 +- `pkg/scheduler/scheduler.go`: 调度器框架的整体代码,框架本身所有的运行、调度逻辑全部在这里 +- `pkg/scheduler/core/generic_scheduler.go`: 上面是框架本身的所有调度逻辑,包括算法,而这一层,是调度器实际工作时使用的算法,默认情况下,并不是所有列举出的算法都在被实际使用,参考位于文件中的`Schedule()`函数 + +### 调度算法逻辑 +逻辑图: +``` +一个没有指定Spec.NodeName的: + + +---------------------------------------------+ + | Schedulable nodes: | + | | + | +--------+ +--------+ +--------+ | + | | node 1 | | node 2 | | node 3 | | + | +--------+ +--------+ +--------+ | + | | + +-------------------+-------------------------+ + | + | + v + +-------------------+-------------------------+ + + 断言(硬性指标)筛选: node 3 资源不足 + + +-------------------+-------------------------+ + | + | + v + +-------------------+-------------------------+ + | 剩余可选 nodes: | + | +--------+ +--------+ | + | | node 1 | | node 2 | | + | +--------+ +--------+ | + | | + +-------------------+-------------------------+ + | + | + v + +-------------------+-------------------------+ + + 优先级判断: node 1: priority=2 + node 2: priority=5 + + +-------------------+-------------------------+ + | + | + v + 选择 max{node priority} = node 2 + node2则成为成功筛选出的与pod绑定的节点 +``` +为了给pod挑选出合适的node,调度器做出如下尝试步骤: +- 第一步,通过一系列的predicates(断言)指标,排除不合适的node,例如:pod.resources.requests.memory: 16Gi, node则计算:node.capacity 减去node上现有的所有pod的pod.resources.requests.memory的总和,如果差小于16Gi,那么则此项predicates结果为false,排除此节点 +- 第二步,对通过了上一步筛选的node,执行一系列的优先级计算函数,计算的对象是node的负载情况,负载是即是node上现有的所有pod的pod.resources.requests的资源的总和除以node.capacity,值越高则负载越高,优先级越低 +- 最终,挑选出了最高优先级的node,若有多个,则随机挑选其中一个 + +### Predicates and priorities policies +调度算法一共由Predicates和priorities这两部分组成,Predicates(断言)是用来过滤node的一系列策略集合,Priorities是用来优选node的一系列策略集合。默认情况下,kubernetes提供内建predicates/priorities策略,代码集中于`pkg/scheduler/algorithm/predicates/predicates.go` 和 `pkg/scheduler/algorithm/priorities`内. + + +### 调度策略扩展 +管理员可以选择要应用的预定义调度策略中的哪一个,开发者也可以添加自定义的调度策略。 + +### 修改调度策略 +默认调度策略是通过defaultPredicates() 和 defaultPriorities()这两个函数定义的,源码在 `pkg/scheduler/algorithmprovider/defaults/defaults.go`,我们可以通过命令行flag --policy-config-file CONFIG_FILE 来修改默认的调度策略。除此之外,也可以在`pkg/scheduler/algorithm/predicates/predicates.go` `pkg/scheduler/algorithm/priorities`源码中添加自定义的predicate和prioritie策略,然后注册到`defaultPredicates()`/`defaultPriorities()`中来实现自定义调度策略。 + +## 调度器源码阅读目录 +- [初始化启动](https://note.youdao.com/) +- 待补充 diff --git a/scheduler/image/p1/Run.jpg b/scheduler/image/p1/Run.jpg new file mode 100644 index 0000000..1cd7075 Binary files /dev/null and b/scheduler/image/p1/Run.jpg differ diff --git a/scheduler/image/p1/cc.jpg b/scheduler/image/p1/cc.jpg new file mode 100644 index 0000000..882beb0 Binary files /dev/null and b/scheduler/image/p1/cc.jpg differ diff --git a/scheduler/image/cobra1.jpg b/scheduler/image/p1/cobra1.jpg similarity index 100% rename from scheduler/image/cobra1.jpg rename to scheduler/image/p1/cobra1.jpg diff --git a/scheduler/image/p1/cobra2.jpg b/scheduler/image/p1/cobra2.jpg new file mode 100644 index 0000000..f484a08 Binary files /dev/null and b/scheduler/image/p1/cobra2.jpg differ diff --git a/scheduler/image/p1/cobra3.jpg b/scheduler/image/p1/cobra3.jpg new file mode 100644 index 0000000..3c2c96d Binary files /dev/null and b/scheduler/image/p1/cobra3.jpg differ diff --git a/scheduler/image/p1/cobra4.jpg b/scheduler/image/p1/cobra4.jpg new file mode 100644 index 0000000..5255eb9 Binary files /dev/null and b/scheduler/image/p1/cobra4.jpg differ diff --git a/scheduler/image/p1/cobra5.jpg b/scheduler/image/p1/cobra5.jpg new file mode 100644 index 0000000..806bf37 Binary files /dev/null and b/scheduler/image/p1/cobra5.jpg differ diff --git a/scheduler/image/p1/cobra6.jpg b/scheduler/image/p1/cobra6.jpg new file mode 100644 index 0000000..1b2bbaa Binary files /dev/null and b/scheduler/image/p1/cobra6.jpg differ diff --git a/scheduler/image/p1/main1.jpg b/scheduler/image/p1/main1.jpg new file mode 100644 index 0000000..a9bdbbe Binary files /dev/null and b/scheduler/image/p1/main1.jpg differ diff --git a/scheduler/image/p1/runCommand.jpg b/scheduler/image/p1/runCommand.jpg new file mode 100644 index 0000000..50da76b Binary files /dev/null and b/scheduler/image/p1/runCommand.jpg differ diff --git a/scheduler/image/p1/scheRun.jpg b/scheduler/image/p1/scheRun.jpg new file mode 100644 index 0000000..dbda6c8 Binary files /dev/null and b/scheduler/image/p1/scheRun.jpg differ diff --git a/scheduler/image/p1/scheduleOne.jpg b/scheduler/image/p1/scheduleOne.jpg new file mode 100644 index 0000000..0a13e2b Binary files /dev/null and b/scheduler/image/p1/scheduleOne.jpg differ