diff --git a/doc/src/benchmark.md b/doc/src/benchmark.md
index 3b5f8e95..1f78223c 100644
--- a/doc/src/benchmark.md
+++ b/doc/src/benchmark.md
@@ -4,7 +4,7 @@
 
 We compare the training time with 1, 2, 4, 8 Tesla V100 GPUs (with a subset of LibriSpeech samples whose audio durations are between 6.0 and 7.0 seconds).  And it shows that a **near-linear** acceleration with multiple GPUs has been achieved. In the following figure, the time (in seconds) cost for training is printed on the blue bars.
 
-<img src="images/multi_gpu_speedup.png" width=450><br/>
+<img src="../images/multi_gpu_speedup.png" width=450><br/>
 
 | # of GPU  | Acceleration Rate |
 | --------  | --------------:   |
diff --git a/doc/src/text_front_end.md b/doc/src/text_front_end.md
index 01b60859..64d5cdb0 100644
--- a/doc/src/text_front_end.md
+++ b/doc/src/text_front_end.md
@@ -101,7 +101,7 @@ LP -> LO -> L1(#1) -> L2(#2) -> L3(#3) -> L4(#4) -> L5 -> L6 -> L7
 
 常用方法使用的是级联CRF，首先预测如果是PW，再继续预测是否是PPH，再预测是否是IPH
 
-<img src="images/prosody.jpeg" width=450><br/>
+<img src="../images/prosody.jpeg" width=450><br/>
 
 
 论文: 2015 .Ding Et al. - Automatic Prosody Prediction For Chinese Speech Synthesis Using BLSTM-RNN and Embedding Features