We need your help to hack kamusiproject.org!
The Kamusi Project, the web’s leading Swahili language resource, needs coding help in a hurry. We have a lot of old code that works pretty well for what we’ve been doing (running a collaborative online Swahili dictionary), but we need to modify and modernize our back end in order to get where we’re going: a free and open source interlinked dictionary and learning center for dozens of African languages.
Our code has been patched together over the years, adding bits and pieces to meet specific needs of the moment. Now we have a tangle of Perl and PHP overlaying a MySQL database, shoehorned into a Drupal platform and installed hastily on a new server. Can you help us get our current functionality working smoothly as a coherent PHP/ Drupal/ MySQL system, so we can then be in a position to expand our model to multiple languages of Africa?
If you can volunteer, please contact CodeAfrica {at} kamusiproject {dot} org
Specifically, our task list includes:
• Converting all our Perl code into PHP. The Perl code is old school, poorly commented, and doesn’t play well with Drupal. This includes four major features: (1) the Edit Engine, through which participants contribute to the dictionary and submissions get processed for the MySQL database; (2) the Grouping Tool, through which users can sort dictionary entries; (3) the Photo Uploader, through which users submit images to illustrate dictionary entries, and (4) the Bantu languages Verb Parser, which makes it possible to look up ridiculously complicated conjugated verbs such as “nitakapomsomesha”.
• Perfecting and enhancing our new Drupalized search engine. We’ve already designed a new PHP dictionary search engine to replace the fossilized Perl system you can see in action now, but we have some distance before the new version is ready to go live. The main challenges are: (1) reflecting live database updates from the Edit Engine; (2) presenting results to match the current format; and (3) building in some kick-ass advanced search features. We expect to go from 1 million searches per month today to tens of millions when we go multilingual, so we need a really robust search engine.
• Making Google behave. The search engines were eating us alive on our old server, crawling nonstop with repeated wildcard searches and any term any user with Google Toolbar had ever typed into our search box. We want the search engines to index the site, but we need to figure out how to prevent them from taking over.
• Organizing our log files. We want to sort our search logs so that we can rank our most popular searches in descending order, with the goal being a tool that can feed the 10,000 most searched for terms (from 45,000,000 searches) to the editors for the languages we’ll be adding. The challenge here is to deal efficiently with the sheer bulk of the dataset.
• Improving the Learning Center code. We have a nice skeleton for a multimedia Swahili learning center, but our funding situation never allowed us to finish developing the online exercises and lesson templates, completing features for interaction between students and instructors, and integrating the exercises with the dictionary and media uploader. Now we have been asked by a university in South Africa if they can use our learning tools for some of the 11 official languages of that country, and we would like to say yes. These tools are already within PHP/ Drupal.
• Widgets. We would like to display a variety of information as a toggled sidebar widget, similar to the “Most Recent Searches” infobox now installed (which needs to be made scrollable, since it changes too quickly). Widgets will include recently updated entries, recent photo submissions, most popular searches of the day, etc. We would particularly like a widget that displays results from a log file query for the most popular searches that are not actually in the database, then lets a user “check out” a missing entry in conjunction with an Edit Entry submission. We would also like a slide show for the homepage that displays images from the Photo Uploader in the context of the associated dictionary entry. The widgets are low urgency, but an easy and fun way to help out.
Once we accomplish these Code Africa tasks, we will be in a position to put the project on steroids, going from two languages to two dozen in a couple of years. Future work will involve re-engineering the database to accommodate multiple languages spoken by over half a billion people, modifying the Edit Engine for each new language, and building a linking tool to join the languages together. In order to accomplish these multilingual goals, though, we really need to get our current code in order.
If you can help, please contact CodeAfrica {at} kamusiproject {dot} org – we look forward to working with you!