WordNet Punjabi (Shahmukhi)
This tool is a desktop application designed to provide users with an intuitive interface for searching and retrieving lexical information from an Excel-based database. Built using the PyQt5 framework, the application offers features like word input, part of speech (POS) category selection, and options to extract specific word details. It allows users to browse through multiple POS categories, which are read from an Excel file, and retrieve relevant information from different database sheets. Users can input a word, choose the corresponding POS category, and receive detailed information about the word, which is displayed in a text box. Additionally, the tool incorporates a 'Contact' button that links to email support for assistance, as well as the ability to trigger searches through keyboard shortcuts like pressing 'Enter'. The tool is particularly useful for linguists or anyone working with lexical data, as it streamlines the process of accessing and managing large datasets efficiently.
Rule based stemmer
The graphical user interface (GUI) consists of two tabs: one for stemming and one for morphological analysis. Users can input words for analysis, and the results are displayed in tables, showing the original word along with its stem or morphological breakdown. The tool allows users to load word lists from text files, process them, and save the results as text, CSV, or Excel files. The tool is designed to be user-friendly, offering features like saving results and clearing inputs. This makes it highly functional for linguistic research and applications in natural language processing (NLP) for Punjabi Shahmukhi.
Punjabi (Shahmukhi) USAS Tagger
This tool is a Tkinter-based desktop application designed for tagging Urdu words using a USAS tagging dictionary. Users can load an Excel file containing tagged words, input text, and tag words based on their corresponding categories. The application displays tagged words, identifies untagged ones, and allows users to highlight untagged words. It features text search functionality, output format options, and the ability to save results in text format. Additionally, users can save untagged words separately, making the tool useful for linguistic analysis and text annotation in Urdu.
Corpus generator
This Tkinter-based application is designed for generating a text corpus by extracting content from web links provided by the user. It allows users to input or import multiple URLs, extract the text from each webpage using BeautifulSoup, and save the collected content in either a single file or as separate files in .txt or .docx formats. The tool features a progress bar for tracking the extraction process and provides a user-friendly interface with options to customize the saving format. Additionally, it includes a help function that opens the default email client for user support, making it ideal for researchers and students needing to gather large amounts of text data efficiently.
Sentence extractor
This PyQt5-based application enables users to process a text corpus by extracting sentences containing specific words from an Excel file. Users can upload both an Excel file with words and a text file with the corpus, and the tool searches for each word in the corpus, extracting sentences with up to 10 words of context on either side. The application features a progress bar to track processing and allows users to save the extracted data into an Excel file. Additionally, it includes a 'Help' tab with usage instructions, making it ideal for textual analysis and data extraction tasks.
Punjabi (Shahmukhi) Morphological Analyzer
This tool is a desktop application built using PyQt6, designed to perform stemming and morphological analysis for Punjabi Shahmukhi words. The application consists of two main functions: a Stemmer that applies predefined rules to find the stem of a word, avoiding stemming for specific common words, and a Morphological Analyzer that breaks down words into prefixes, roots, and suffixes using linguistic rules.
Punjabi (Shahmukhi) POS Tagger
This application is a Tkinter-based graphical tool designed for part-of-speech (POS) tagging of Punjabi text written in the Shahmukhi script (Urdu). It allows users to load a custom dictionary from an Excel file, input text, and automatically tag words with their corresponding POS labels. The interface features an input text area, an output area for tagged text, and options to save both tagged and untagged words. Untagged words are visually highlighted in the input text for easy identification. The tool also includes features such as a search function, the ability to choose between different output formats (rows or columns), and buttons to clear or reset the text areas for new processing tasks. This application simplifies the process of tagging Punjabi text, making it especially useful for linguistic research, educational purposes, or natural language processing (NLP) projects that require tagging and classification of words. predefined rules to find the stem of a word, avoiding stemming for specific common words, and a Morphological Analyzer that breaks down words into prefixes, roots, and suffixes using linguistic rules.