LLM Research
While cybersecurity work is dedicated to protecting people and organizations, it may at times negatively impact them. Poorly implemented protection mechanisms may expose the clients to loss of private or confidential information. Intrusion detection processes may lead to mistaken identity that may have significant consequences, from technical to moral ones.
Cybersecurity software designers, system administrators, and users should be trained to reason ethically about their decisions and actions. This requires a methodology for reasoning and an easy-to-use decision support mechanism that provides cybersecurity experts with ethical reasoning tools and advice. In prior work, we have defined an ethical reasoning methodology that relies on six simple principles, which are ranked and compared using a trade-off mechanism. The methodology provides the user, via reasoning steps, advice regarding the advisability of a particular cybersecurity design, system administration, or use decision.
The current project seeks to create a problem-scoping interface that collects the information through a chat interface, weighs the costs and benefits using a reasoning matrix and then generates advice using an LLM agent trained to use the reasoning mechanism.
The successful candidate for this internship should be conversant with using LLM agents programmatically via APIs, preferably in the AWS environment, and have full-stack web development experience. A junior or senior in computer science, electrical engineering, or technology is preferred.
This is a paid internship. The expectation 10 hours of work a week of focused development for ten weeks. If the project generates a paper, the student will be credited as a co-author.
For details, contact Dr. Sorin Adam Matei, smatei@purdue.edu
Research in Generative AI, Software security, Embedded systems, and Compression
The CYNICS group is seeking aspiring researchers to join their research team and engage in exciting cross-disciplinary research projects spanning the areas of generative AI, software security, embedded systems, and compression. Hourly positions are available to both undergraduate and graduate students. All interested students are invited to contact Dr. Hany Abdel-Khalik (abdelkhalik@purdue.edu) with their CVs.
Generative AI
Get into the guts of popular GenAI tools such as ChatGPT and Claude and find out what makes or breaks them. Familiarity with adapting state-of-the-art algorithms and the ability to build machine-learning models from scratch are highly desired.
Software Security and Embedded Systems
Work on state-of-the-art encryption algorithms for software security and investigate lightweight alternatives for pure hardware deployments. A strong background in electronics and a bias towards hands-on projects are preferred. Red-teaming / reverse-engineering experience is a bonus.
Compression
Investigate the inner workings of several compression algorithms for images, audio, video, time-series, and more, and apply to real problems ranging from reactor modeling and simulation to reliable communication in denied environments. A strong background in statistics and signal processing is preferred.
If interested, send your CV to Dr. Hany Abdel-Khalik (abdelkhalik@purdue.edu).
Lilly Research Opportunity
CERIAS will recruit a team of 2-4 student workers. Students will be US citizens. Desirable candidates will have experience in computer security, software development, and language learning models.
Analysts deal with large volumes of indicators every day as part of security and intelligence investigations. Given a set of common indicators classes (such as emails, IPs, domains, etc), can we leverage AI/ML technology to add useful context in the form of structured tags, descriptive narratives, and deeper relationship identification between indicators? What kind of succinct and structured tags are useful to the analyst? How can LLMs be used to enrich indicators using them? How can relationships between existing indicators be surfaced and which ones are useful? All of these and more are compelling questions that require a careful, disciplined, and modular approach with an eye toward allowing others to operationalize the solution within different environments.
Deliverables:
At a minimum source code demonstrating the solution as well as a final presentation discussing it at length.
Depending on the approach:
- A pre-trained model using the assembled corpus.
- Playbooks on how to train a new model on a different corpus.
- A RAG AI technique able to retrieve the latest information from external sources.
- Use freely available open-source data or data sources available through academic partnerships..
- Be written in Python.
- Constitute a modular framework for others to interface with.
- Be deployable as a software package to a Docker container.
- Return final summary data in JSON format. It will also include a schema describing the JSON format.
- Implement any kind of user authentication.
- Implement the ability to query any specific database, TIP, SIEM, or other platform that Lilly may already have deployed. We will design and document the code to simplify the eventual implementation of query capabilities.
For more information email Dr. Courtney Falk (falkc@purdue.edu)