Our Projects

Explore our innovative projects in computational linguistics and research.

WordNet Punjabi

(Shahmukhi)

This tool is a desktop application designed to provide users with an intuitive interface for searching and retrieving lexical information from an Excel-based database. Built using the PyQt5 framework, the application offers features like word input, part of speech (POS) category selection, and options to extract specific word details. It allows users to browse through multiple POS categories, which are read from an Excel file, and retrieve relevant information from different database sheets. Users can input a word, choose the corresponding POS category, and receive detailed information about the word, which is displayed in a text box. Additionally, the tool incorporates a 'Contact' button that links to email support for assistance, as well as the ability to trigger searches through keyboard shortcuts like pressing 'Enter'. The tool is particularly useful for linguists or anyone working with lexical data, as it streamlines the process of accessing and managing large datasets efficiently.

This PyQt5-based application enables users to process a text corpus by extracting sentences containing specific words from an Excel file. Users can upload both an Excel file with words and a text file with the corpus, and the tool searches for each word in the corpus, extracting sentences with up to 10 words of context on either side. The application features a progress bar to track processing and allows users to save the extracted data into an Excel file. Additionally, it includes a 'Help' tab with usage instructions, making it ideal for textual analysis and data extraction tasks.

Sentence extractor

Punjabi (Shahmukhi) Morphological Analyzer

This tool is a desktop application built using PyQt6, designed to perform stemming and morphological analysis for Punjabi Shahmukhi words. The application consists of two main functions: a Stemmer that applies predefined rules to find the stem of a word, avoiding stemming for specific common words, and a Morphological Analyzer that breaks down words into prefixes, roots, and suffixes using linguistic rules.

Rule Based Stemmer

The graphical user interface (GUI) consists of two tabs: one for stemming and one for morphological analysis. Users can input words for analysis, and the results are displayed in tables, showing the original word along with its stem or morphological breakdown. The tool allows users to load word lists from text files, process them, and save the results as text, CSV, or Excel files. The tool is designed to be user-friendly, offering features like saving results and clearing inputs. This makes it highly functional for linguistic research and applications in natural language processing (NLP) for Punjabi Shahmukhi.

Punjabi (Shahmukhi) USAS Tagger

This tool is a Tkinter-based desktop application designed for tagging Urdu words using a USAS tagging dictionary. Users can load an Excel file containing tagged words, input text, and tag words based on their corresponding categories. The application displays tagged words, identifies untagged ones, and allows users to highlight untagged words. It features text search functionality, output format options, and the ability to save results in text format. Additionally, users can save untagged words separately, making the tool useful for linguistic analysis and text annotation in Urdu.

Corpus Generator

This Tkinter-based application is designed for generating a text corpus by extracting content from web links provided by the user. It allows users to input or import multiple URLs, extract the text from each webpage using BeautifulSoup, and save the collected content in either a single file or as separate files in .txt or .docx formats. The tool features a progress bar for tracking the extraction process and provides a user-friendly interface with options to customize the saving format. Additionally, it includes a help function that opens the default email client for user support, making it ideal for researchers and students needing to gather large amounts of text data efficiently.