×

Loading...
Ad by
  • 最优利率和cashback可以申请特批,好信用好收入offer更好。请点链接扫码加微信咨询,Scotiabank -- Nick Zhang 6478812600。
Ad by
  • 最优利率和cashback可以申请特批,好信用好收入offer更好。请点链接扫码加微信咨询,Scotiabank -- Nick Zhang 6478812600。

Thanks. I have managed to search as follows. However refining is not yet done. Access to our Web Site is blocked by firewall :(

I found, at my best, no one has the perfect solution to documents with various formats. My steps:

1, index all searchable files and generate database files to be searched for, EVERY NIGHT! ( One may post documents and bulletins in daytime , lots of huge size documents)

2, Search though the above generated files when the engine is lanched.

Unfortunately there seems no many tools for converting binary files to plain text files. Searchable files must be in plain text like .html files or *.txt

3, Search engine like that of rolia seems can only search .html files only
hmm :). This is a limitation. The catdoc and xls2csv tools may work and i am trying to use others.

Any idea? Any Demo sites?

Will appreciate it.

Thanx
Report

Replies, comments and Discussions:

  • 工作学习 / IT技术讨论 / 搜索引擎search engine,那位有开发经验?particularly in indexing *.doc,*.xls,*pdf
    免费engine 基本都不完全符合要求,要改造或重写。
    觉得pdftotext, catdoc,xls2csv ... 如何?有更好的推荐吗?
    正准备用perlfect (ksearch, htdig ) + pdftotext+catdoc+xls2csv.
    很没有把握,时间紧.
    machine: dec unix+perl5.004, medium size site
    Thanx
    • load it to MySQL, enable full-text search
    • I did a search engine before, let me know your website name. I can show a demo for you
      • Thanks. I have managed to search as follows. However refining is not yet done. Access to our Web Site is blocked by firewall :(
        I found, at my best, no one has the perfect solution to documents with various formats. My steps:

        1, index all searchable files and generate database files to be searched for, EVERY NIGHT! ( One may post documents and bulletins in daytime , lots of huge size documents)

        2, Search though the above generated files when the engine is lanched.

        Unfortunately there seems no many tools for converting binary files to plain text files. Searchable files must be in plain text like .html files or *.txt

        3, Search engine like that of rolia seems can only search .html files only
        hmm :). This is a limitation. The catdoc and xls2csv tools may work and i am trying to use others.

        Any idea? Any Demo sites?

        Will appreciate it.

        Thanx