diff --git a/doc/src/benchmark.md b/doc/src/benchmark.md
index 3b5f8e95..1f78223c 100644
--- a/doc/src/benchmark.md
+++ b/doc/src/benchmark.md
@@ -4,7 +4,7 @@
We compare the training time with 1, 2, 4, 8 Tesla V100 GPUs (with a subset of LibriSpeech samples whose audio durations are between 6.0 and 7.0 seconds). And it shows that a **near-linear** acceleration with multiple GPUs has been achieved. In the following figure, the time (in seconds) cost for training is printed on the blue bars.
-![](images/multi_gpu_speedup.png)
+![](../images/multi_gpu_speedup.png)
| # of GPU | Acceleration Rate |
| -------- | --------------: |
diff --git a/doc/src/text_front_end.md b/doc/src/text_front_end.md
index 01b60859..64d5cdb0 100644
--- a/doc/src/text_front_end.md
+++ b/doc/src/text_front_end.md
@@ -101,7 +101,7 @@ LP -> LO -> L1(#1) -> L2(#2) -> L3(#3) -> L4(#4) -> L5 -> L6 -> L7
常用方法使用的是级联CRF,首先预测如果是PW,再继续预测是否是PPH,再预测是否是IPH
-![](images/prosody.jpeg)
+![](../images/prosody.jpeg)
论文: 2015 .Ding Et al. - Automatic Prosody Prediction For Chinese Speech Synthesis Using BLSTM-RNN and Embedding Features