Skip to content

Commit

Permalink
first hackathon post
Browse files Browse the repository at this point in the history
  • Loading branch information
Matt Hall authored and Matt Hall committed Dec 2, 2023
1 parent 21aa55d commit 0057ce7
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 1 deletion.
8 changes: 7 additions & 1 deletion blog/hackathon-season.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ I'm on my way home from a hackathon right now, my first one for about a year. It

## An invitation

If you're not quite ready to host one, come and experience one instead. [The FORCE Language Modeling Hackathon](https://www.npd.no/en/force/events/in-person--the-npd-language-modelling-hackathon-2023/) is happening in Stavanger, Norway, on 30 November and 1 December 2023. I'm stoked to be hosting, with Peter Bormann organzing. The plan is to fine-tune some language models with open subsurface data from the Norwegian shelf. **What would you ask a virtual assistant that has read everything about _your_ project?**
If you're not quite ready to host one, come and experience one instead. [The FORCE Language Modeling Hackathon](https://www.npd.no/en/force/events/in-person--the-npd-language-modelling-hackathon-2023/) is happening in Stavanger, Norway, on 30 November and 1 December 2023. I'm stoked to be hosting, with Peter Bormann organizing. The plan is to fine-tune some language models with open subsurface data from the Norwegian shelf. **What would you ask a virtual assistant that has read everything about _your_ project?**

<div style="text-align: center">[**Find out more and sign up!**](https://www.npd.no/en/force/events/in-person--the-npd-language-modelling-hackathon-2023/)</div>

---

### Changelog

- **20230-12-01** β€” fixed typo
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 46 additions & 0 deletions blog/the-chatbots-are-coming/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: "The chatbots are coming"
author: "Matt Hall"
date: "2023-12-02"
description: "What happened at the FORCE LLM hackathon in Stavanger"
---

πŸ€– **This week, seven teams of scientists and data scientists collaborated to explore ideas in large language modeling applied to a large new open dataset. Here's what happened.**

[The FORCE consortium](https://www.npd.no/en/force/), which has hosted hackathons and data science contests before (read [this](https://agilescientific.com/blog/2018/9/27/force-ml-hackathon-project-round-up) and [that](https://agilescientific.com/blog/2019/10/11/force-ml-2019-project-round-up)), hosted another groundbreaking event last week &mdash; [the NPD language modeling hackathon](https://www.npd.no/en/force/events/in-person--the-npd-language-modelling-hackathon-2023/). The event took place at the NPD in Stavanger, Norway, on 29 & 30 November, and 1 December 2023.

As in past years, the event was organized by a small team coordinated by **Peter Bormann** (ConocoPhillips), who not only believes passionately in the importance of open collaboration but is committed to acting on that belief πŸ™Œ Scroll down for the rest of the organizational credits.

One major feature of this event was the large new dataset the team has assembled. This contains almost three million pages of text from various reports published by the Norwegian Petroleum Directorate, [Netherlands Oil and Gas](https://www.nlog.nl/en) and [the UK North Sea Transition Authority](https://www.nstauthority.co.uk). It will soon be published under [the NLOD 2.0 licence](https://data.norge.no/nlod/en/2.0), and should be an exciting resource for the community; we just want to make sure we have taken reasonable steps to protect people's privacy before publishing it.


## Projects

Here's a very quick rundown of the teams that formed at what I believe was the first public LLM-based hackathon in Norway or in the energy sector (AkerBP [ran one of their own](https://akerbp.com/en/aker-bp-hackathon-2023-a-celebration-of-data-science/) a few weeks ago):

<div style="float:right; margin-left:12px"><img src="chat-bot-gen-art.png" width="300px" /></div>

- **Anonymizers** β€” Masking personally identifiable information in public datasets.
- **Embedding enthusiasts** β€” Fine-tuning an embeddings model, using cleaner data.
- **Zero-shot chatbots** β€” What kind of questions can chatbots answer about the dataset?
- **Knowledge-graphers** β€” Extracting a knowledge graph and providing it to chatbots.
- **Q&A generators** β€” Generating question-answer pairs for fine-tuning Q&A chatbots.
- **Metadata extractors** β€” Automatically pulling metadata from the dataset.

Tomorrow I will put up another post describing the projects in more detail. [When it's up, you can click here to read it!](/blog/force-hackathon-projects.html)


## Credits

It takes a community of organizers to pull off a community event like this. Here's a probably incomplete list, apologies if I missed anyone ([drop me a line!](mailto:[email protected])):

- **Jesse Lord**, ([Kadme](https://kadme.com) and [Fabriq](https://npd.fabriqai.com)) for the dataset, which will soon be released under an open license.
- **Lukas Mosser**, ([AkerBP](https://akerbp.com/en/)) for the starter notebook and know-how.
- **Paul Cleverley**, [Infoscience](https://infosciencetechnologies.com) for the named entity tags.
- **Eirik Haughom** and **Frode Odinsen**, [Microsoft](https://www.microsoft.com/) for the in-event Azure support.
- **The NPD**, especially **Janke Ro** and those involved in FORCE.
- It was my privilege to facilitate the proceedings, a job I am ill-suited for but enjoy anyway πŸ˜… Thank you to Peter for the opportunity!

---

<span style="font-size:80%">_Generated art by Dylan Loss_</span>

0 comments on commit 0057ce7

Please sign in to comment.