Data Analytics Competition 2023

Data Analytics Competition & Problem Statement

The UPSTAT 2023 conference features its signature data analytics competition aimed at further encouraging a more active participation from students at all levels, from the high school level all the way to the doctoral level.  A team of at least three students and no more than five students may enter the competition by submitting their entry information which includes team name, school, and the names and e-mail addresses of the team members. 

To be eligible for any of the UPSTAT awards, teams will be required to submit a report (maximum of 10 pages) on their results by April 21st, 2023, and must be present at the award ceremony to present their findings. See more about the prizes and awards below.

Dates to Remember

Registration Open ups on: April 16, 2023
Duration of competition: April 16-21, 2023
Deadline to receive analysis and report: April 21, 2023 by 11:59pm

You have up to five days to analyze the data and showcase your superstar data scientist skills and futuristic teamwork wisdom via a compellingly appealing final report.

Data Analytics Competition Committee

Ernest Fokoue (RIT) - epfeqa@rit.edu
Gregory Babbitt (RIT) – gabsbi@rit.edu

Data Competition Team Registration Form: https://forms.gle/cEbNs6t2osMckQRdA

Problem Statement

We are excited to announce the GPT-3 Data Competition, which will be held as part of the UPSTAT Conference on April 21st and 22nd, 2023. The competition will showcase the potential of large language models like GPT-3 in generating high-quality text, and provide an opportunity for participants to demonstrate their skills and creativity in natural language processing and machine learning.

Challenge description:

The GPT-3 Data Competition challenges participants to use the GPT-3 dataset to generate coherent and relevant text in response to a set of prompts. The goal is to showcase the potential of GPT-3 in generating high-quality text that is grammatically correct, semantically meaningful, and contextually appropriate.

Participants will be provided with a set of prompts related to a specific topic or theme. The prompts may be in the form of questions, statements, or incomplete sentences. Participants will use the GPT-3 dataset to generate text that is coherent and relevant to the given prompts.

Each team is expected to work on exactly three (3) prompts of their own choice, selected from the list of potential prompts provided in Appendix. Each of the three (3) must come from a different category.

The generated text should be grammatically correct, semantically meaningful, and contextually appropriate. It should demonstrate the potential of GPT-3 in generating high-quality text that is similar in style and content to human-written text. The generated text may be in the form of a longer article, or any other format that is appropriate for the given prompts.

Participants will be evaluated based on the quality of the generated text, as well as the creativity, originality, and relevance of their responses to the given prompts. Participants may use any programming language or tool that is compatible with the OpenAI API to generate the text.

Each team must describe as clearly as possible the technical and non technical process and details used to arrive at their results.

To participate in the competition, participants must register for the UPSTAT Conference and submit their entries by the submission deadline. The winners will be announced at the conference and will receive prizes. See below for more details.

Note: This is a pure GPT-3 competition. All entries generated directly through ChatGPT or other similar models will be voided, and that participants must use the OpenAI API to generate text based on the given prompts.

Important guidelines

Visit the OpenAI website and apply for an API key.

Once your application is approved, you will receive an API key that you can use to access the GPT-3 dataset.

Familiarize yourself with the OpenAI API documentation, which provides instructions for accessing the dataset and working with its various features.

Use the OpenAI API to download the GPT-3 dataset to your local machine. The following are some possible steps for downloading the dataset using the OpenAI API:

Use the openai.Secrets object to set your API key.

Use the openai.Completion object to send a request to the API for data from the GPT-3 dataset.

Use the openai.File object to save the downloaded data to your local machine.

Once you have downloaded the dataset, you can begin exploring it and working with it using natural language processing and machine learning techniques.

It's important to note that working with the GPT-3 dataset requires advanced technical skills and expertise in natural language processing and machine learning. If you're not familiar with these fields, you may want to consider working with a mentor or advisor who can help guide you through the process. Additionally, be sure to review the OpenAI API terms of use and comply with all applicable laws and regulations when working with the dataset.

We encourage all undergraduate and graduate students in data science, statistics, mathematics, engineering, linguistics, physical sciences, biological sciences, social sciences, and computer science, to participate in this exciting competition and demonstrate their skills and creativity in natural language processing and machine learning.

Helpful sites

GPT3 Tutorial: How to Download And Use GPT3(GPT Neo)

Rules of the competition

A team must be comprised of at least 3 members and at most 5 members. Due to all the challenges brought about by the pandemic, we will make an exception this year and allow lone rangers (teams of only one person).

All team members must be graduate or undergraduate students with proof of matriculation and/or affiliation [Active university email address and/or letter from academic advisor]
All team members must be registered to the UPSTAT 2023 conference
Each team must enter the competition via enrollment by the team captain. The team captain must send the registration as a single file containing all the team members proof of UPSTAT 2023 registrations, team members email addresses and the team name.
Additional rules are found on the conference website. Please be sure to double check that your whole team comply with the rules, prior to entering the competition.
At any point of the competition, the captain of the team is allowed to contact the data analytics competition committee of UPSTAT 2023 with questions aimed at helping clarify aspects of the data and/or the competition.

Prizes

1st Place - Gold medal – Champion – (average 90% from all the judges)
2nd Place – Silver Medal - Runner up – (Average of 85% or more)
3rd Place – Bronze Medal – (Average 75% or more)

All the prizes come with a monetary award and a beautiful certificate of award to each and every member of the team.

Important Notes:

[Certificate of participation] All the teams that submit a final report will earn at the very least a certificate of participation
[Honorable mention] An Honorable mention may be given to a team that missed the bronze medal threshold by not too much
[Possibility of no medal] The committee reserves itself the right to issue no medal at all if no team produces work worthy of an award

Maximize your chances

Creativity: Judges will be looking for entries that demonstrate a high degree of creativity and originality in their use of the GPT-3 dataset. Participants should aim to develop novel and innovative approaches to working with the data.
Technical Skill: Entries should demonstrate technical skill and mastery of the tools and techniques used to work with the GPT-3 dataset. Participants should aim to create well-designed and well-executed solutions to the challenge.
Relevance: Entries should address a relevant problem or research question related to the use of large language models like GPT-3. Participants should aim to demonstrate the practical and real-world applications of the dataset.

Clarity: Entries should present their results and insights in a clear and concise manner that is easy for judges and audiences to understand. Participants should aim to communicate their findings effectively and avoid technical jargon or overly complex explanations.
Impact: Judges will be looking for entries that have the potential to make a significant impact in their field of study or application. Participants should aim to demonstrate how their work with the GPT-3 dataset could lead to meaningful improvements or advancements in their area of focus.

Appendix: Dictionary of potential prompts to use.

General

Describe your favorite book, movie, or TV show in your own words.

Write a short story about a person who discovers a magical object.

Describe your ideal vacation destination and why you would love to visit there.

Write a product review for a new tech gadget that you recently purchased.

Write a short article about the benefits of practicing mindfulness and meditation.

Describe a personal experience that taught you an important life lesson.

Write a letter to your future self about your hopes, dreams, and aspirations.

Describe a scientific discovery or technological advancement that you find fascinating.

Write a summary of a recent news article or research paper in your field of study/interest.

Describe a problem or challenge that you have faced and how you overcame it.

Mathematics:

Explain what a limit is in calculus and provide an example.

Write a proof of the Pythagorean theorem.

Describe an application of linear algebra in real-world problem solving.

Write an explanation of the concept of a fractal, and provide an example.

Describe a mathematical problem that you find particularly interesting or challenging.

Statistics:

Explain the difference between correlation and causation.

Describe a statistical technique that can be used to identify outliers in a dataset.

Write an analysis of a dataset that includes a hypothesis test and a confidence interval.

Describe the difference between a Type I and a Type II error in hypothesis testing.

Explain the concept of power in statistical analysis.

Computer Science:

Write an explanation of the difference between a stack and a queue data structure.

Describe an algorithm for sorting an array of integers in ascending order.

Write a brief analysis of the time and space complexity of a given algorithm.

Describe the difference between synchronous and asynchronous communication in computer networking.

Write an explanation of the concept of virtual memory in operating systems.

Programming:

Write a program that generates a random number and prompts the user to guess it.

Write a program that calculates the factorial of a given number using recursion.

Write a program that reads a text file and counts the frequency of each word.

Write a program that implements the quicksort algorithm for sorting an array of integers.

Write a program that simulates a simple game or puzzle, such as tic-tac-toe or Sudoku.

Good luck and Happy AI Exploration to all the teams.