apt-get install  python-sphinx    
apt-get install  sphinxsearch    
mkdir rest    
cd rest/    
sphinx-quickstart    
i create my first article in restructured text.
http://s.yunio.com/!LrAsu
please download it and untar it on your computer,cd into /rest/build/html,open index.rst with your chrome.
i found that in restructured text search function:
1.can not search chinese character
2.can not search  short words  
please see attatchment 1,it is my  target article to be searched you can see
you can see  is and 标准 in the text.
please see attatchment 2,can not search chinese character 标准  which is in the text. 
 please see attatchment 3,can not search  short words
please see attatchment 3,can not search  short words  is which is in the text.

how can i solve the problem?
Edit:
Sphinx only build index for a whole chinese sentence since there is no space in it and Sphinx doesn't know where to split words to build indexes. Check the file searchindex.js for the indexes generated. 
Try search the word '标准表达方式', it works. ^_^
Sphinx build indexes using a python scrpit search.py. Looking into it we can find 
stopwords = set("""
a  and  are  as  at
be  but  by
for
if  in  into  is  it
near  no  not
of  on  or
such
that  the  their  then  there  these  they  this  to
was  will  with
""".split())
That is why short words cannot be found. You can remove these words from this list if you just want them to appear in index.
We can also find this line:
word_re = re.compile(r'\w+(?u)')
This is the regular expression that is used by Sphinx to split words. Now we can see why it cannot index chinese words.
The solution is to add chinese word split support into this file. Someone has already done it: http://hyry.dip.jp/tech/blog/index.html?id=374
Answer for Sphinx search engine:
I leave it here in case others may find it useful. Thanks for mzjn to point it out.
Sphinx do not support Chinese by default since it cannot recognize chinese charset. It doesn't know where to split words to build indexes. You need to modify the configuration file to let it do indexing for Chinese words.
More specifically, you should modify charset_table, ngram_len, ngram_chars in sphinx.conf to make it work. You can google these keywords for the proper configuration. 
However, Sphinx may generate a huge index since every single chinese character is treated as a word. So try coreseek instead if you really want to build index for chinese documents.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With