Project SPa: A SQL Parser for Teaching SQL
Centre de Recherche Créatech sur les Organisations Intelligentes
Creating value from data is a source of business competitivity. However, many organizations still use flat files and data extracts to carry out their analyses. This approach is time consuming, prone to errors and inefficient. With the ubiquitous use of information systems in business, learning to manipulate data stored in a database with the SQL language is an essential skill for data analysts and data scientists. More generally, understanding database basics offers a unique view on business processes and various data-driven value creation opportunities.
The Centre de Recherche Créatech sur les Organisations Intelligentes is a research center at Université de Sherbrooke that focuses on the use of data and information to improve the performance of organizations. This project is an initiative of Pr. Daniel Chamberland-Tremblay.
Pr. Chamberland-Tremblay teaches Business Technology Management and Business Intelligence at École de gestion at Université de Sherbrooke. His research interests include data management, data governance, data security and the data-based value creation process.
Traditionally, teaching the use of SQL language for solving business problems has largely been based on learning its inner working one operator at a time. If this approach best serves science and engineering students, it can be a challenge for some business students that do not have a formal training in mathematics.
To devote more time to problem structuring and student feedback, instructors can benefit from a SQL parser that offers a detailed feedback on and automatic grading of the student works. This project aims at building such a parser.
Objectives and limits
The main objective of this project is to develop a SQL language parser to support teaching in the field of business technology management. More specifically, the project goals are:
- Develop a SQL parser that provides detailed feedback (beyond the traditional syntax error) on the student work in an English/French readable form. The parser should support grading and detailed feedback. If time allows, the parser could be inserted into a website and be directly accessible to the students.
- Test the SQL parser with a standardised set of SQL queries.
This project uses the open-source MIKE2.0 (Method for an Integrated Knowledge Environment).
MIKE2.0 is an open-source methodology for Enterprise Information Management that provides a framework for iterative information development. The main goals are:
- Driving an overall approach through an organization's Information Strategy
- Enabling people with the right skills to build and manage new information systems while creating a culture of information excellence
- Moving to a new organisational model that delivers an improved information management competency
- Improving processes around information compliance, policies, practices, and measurement
- Delivering contemporary technology solutions that meet the needs of highly federated organizations
Information development starts with the assessment of the organizational business and technological contexts. Combined with the organizational information objectives, theses phases result in a gap analysis that details priorities for the implementation phases. The iterative nature of information development ensures that information resources are developed in an incremental manner that fits the evolving organizational needs.
As the project progresses, steps can be added, modified, or removed to improve the different aspects of data management from data identification and collection to data preparation and analysis.
This project will rely on the well-established Python library sqlparse. sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting, and formatting SQL statements.
The project will be hosted on GitHub.
The iterative nature of the project warrants for a regular monitoring. Typically, students are expected to present their progress every week, or every two weeks, in short briefings similar to Scrum stand-up meetings. During these meetings, the student will be asked to report on the work done, on the tasks to come and on the issues blocking progress.
Students can also contact the supervisor through email or Teams at any time to resolve blocking issues.
This project can be designed to fit internships ranging from 45h to 225h, or event more. Given the exact availability of the candidate, the project can be limited to few data sources or be expanded to a full-blown SQL parser that spans different SQL dialects.
The project is divided in five stages:
Stage 1: Project setup and kickoff
This stage is dedicated to the installation of the project technological environment, including Python, sqlparse and their dependencies. Functional tests to ensure the adequate behavior of the components will be carried out. During this stage, the candidate will be required to familiarise his or herself with the basics of each framework.
Stage 2: Building a proof of concept
At this stage, the candidate will build a simple, yet functional, SQL parser from a standardized set of SQL queries. Deliverables include the proof-of-concept parser, a model of the SQL validation process and a first sample of output.
Stage 3: Scaling validation
This stage is dedicated to expanding the capabilities of the parser to the different SQL statements, keywords, functions, etc. in a software architecture that enables further development. Deliverables include the parser and a SQL test set supporting analysis. This stage can be scaled up or down depending on the internship duration by adding new SQL dialects and improving standardized feedback or limit the scope to a few keywords.
Stage 4: Offering understandable feedback to the Web (optional stage)
If the candidate reaches a stable parser within a reasonable timeframe of the internship, integration of the functionalities to a Django website will be encouraged. With the use of the website, students will be able to directly submit their work for immediate feedback. A basic knowledge of Django is required if the candidate wishes include this stage during the internship.
Stage 5: Reporting on the project
The last, but mandatory, stage is project reporting. The candidate will be asked to package all development in a manner that supports reuse by others. A short project report will also be required before the completion of the internship.
The salary for the internship ranges from 18$/h to 22$/h depending on experience and skills.
The internship candidate should be a graduate student knowledgeable in Python and SQL. A basic knowledge of Django would be an asset.
Candidates that want to know more about this project or that wish to apply should contact Pr. Chamberland-Tremblay through email at email@example.com.