IoT 8: Making My Search Engine

Last Updated on October 2, 2021

DuckDuckGo was the search engine for the in-site search of this web site as I mentioned a past post, IoT 5 (Is Keyword World Ruined?). However, results for this website seem to be minimized. Note that this minimization is because of Bing that DuckDuckGo brings its results. Bing would simply select contents with the amount of accessing pages. Therefore, I'm just making my search engine. Test my search engine (word search).

Here are search rules.

For security reasons, I haven't opened the code of my website. However, I note several ideas to make my search engine.

1. The core is a bunch of commands and scripts that are executed in a server online. These are in charge of filtering and sorting in my database-less website. I refereed examples of commands using Schwartzian Transform, Perl scripts, etc. Overall, we need to care of command injection attacks from the search form. "; (semicolon)" can insert other commands. Besides, "' (single quotation)" "" (double quotation)" can also insert other ones. If your engines are with PHP, you can use a command, escapeshellarg, to escape any suspicious character. I also limit characters to be searched.

2. Multiple words are searched at once for multiple lines. For examples:

Good Guitar:

 - Nice

 - Middle Power

Better Guitar:

 - Nice

 - High Power

Test Nice&Guitar and High Power&Nice&Guitar to compare results.

To get multiple lines, I remained previous 2 lines and next 2 lines on search results. If you search multiple words, the procedure will be done multiply with piping.

3. Results should are sorted and arranged. In my search engine, results in a post are combined, and highlighted words to be searched.

The core of my search engine can be written by piping on shell script or Perl scripts. In the modern structure, PHP is on the front line to users. I'm not against this structure because of the security issue on file permission of CGI scripts including Perl, i.e., accessing users as guests have permissions to execute scripts. However, this issue has already been resolved on the side of servers, e.g., using suEXEC in Apache HTTP Server. PHP implicitly sends the "Content-type: text/html\n\n" header and makes no perception about its standard output. Can you explain why PHP may print messages for debugging on the page you access? I think that executing Perl on PHP for processing to edit and arrange text is a good idea because this structure is easy to write code, especially for regular expressions. I'll explain the structure. However, Halloween is coming, and structuring Halloween decorations under the clear sky of autumn has more priority than computer codes. I have experienced that Halloween had already passed without any decoration because of crazy days at my desk, and I felt the daemon of time in the winter on the year.

Please free to contact me by E-mail if you have any opinion or comment on this site.
Link: E-mail Adress
Push the "Search!" button to show results. How to search for a word?