github.com

I love Jekyll, especially the Datafiles[0] feature, which lets you use CSV/JSON/YAML files and iterate through them. Mixed with the Jekyll Data Pages generator[1], which lets you create a page for every row in your dataset, it is a very powerful combination.

However, Liquid is a terrible language for data-mangling, and simple filtering/sorting/merging can become very annoying. So I wrote a Jekyll SQLite plugin that lets you use the same data interface in Jekyll/Liquid, but backed by a SQLite file(s).

It gives you the simplicity of the Baked Data pattern[2], and the flexibility of using SQL for data-wrangling, within a static site generator.

As a demo, I took the northwind dataset, and generated a site[3] with a few sample queries[4]. It demos both site-level, and page-level queries alongside data-pages generator to generate a page for every product/category/customer.

I've been using this across a few sites in production for almost a year, looking for feedback on usage semantics and feature suggestions.

[0]: https://jekyllrb.com/docs/datafiles/

[1]: https://github.com/avillafiorita/jekyll-datapage_gen

[2]: https://simonwillison.net/2021/Jul/28/baked-data/

[3]: https://northwind.captnemo.in/

[4]: https://github.com/captn3m0/northwind

222
67
klandergren 1 day ago

Nice! Great to see innovation in the Jekyll space.

Quick FYI on Jekyll performance:

I noticed slow generation times on one of my sites and traced it to a plugin that was spawning a git process to get the last commit info for every page. I wrote a drop-in replacement called jekyll-last-commit[0] that uses the ruby libgit2 wrapper for improved performance. Details on its origins are in an old HN comment[1] if you are interested!

[0]: https://github.com/klandergren/jekyll-last-commit

[1]: https://news.ycombinator.com/item?id=34331663

captn3m0 1 day ago

The other last-modified-plugin is now archived, and no longer included in the default Jekyll, but the sitemap and seo plugin still seem to rely on the page.last_modified_at property, albeit undocumented (https://github.com/jekyll/jekyll/issues/9702), so might be worth getting the implementation switched upstream perhaps?

We show the last-modified date on endoflife.date products (https://endoflife.date/jekyll), but I'm not sure how we have the property set right now, since I don't remember including either plugin. Thanks for creating this, I have a few other sites where this will be quite helpful.

philipwhiuk 19 hours ago

Thanks for endoflife.date!

gjtorikian 1 day ago

Oh hey that’s me! Glad you found it useful and thanks for carrying the torch forward.

mehdix 18 hours ago

I'm running my website with Jekyll for around a decade. Interestingly I use sqlite for comments submitted to the server (cgi-bin, mind you) and populate the pages during build. Your plugin is a perfect match for my usecase. Will definitely consider it. Nice work!

zephyreon 1 day ago

I’ve always been a big fan of Jekyll because it’s so easy to use & so stable. This adds a lot of value to the ecosystem. I’ve built a bunch of faculty websites over the years where I needed a lot of structured, repetitive data (papers, honors/awards, etc.). It would have been so much easier to manage if I could have stored that data in a database instead of just flat files.

anticorporate 1 day ago

This is awesome, and would solve an actual problem I've been thinking through... if only I had selected Jekyll for my static site generator. Does anyone know of a similar solution for Hugo?

Technically I'm sure I could run a script to generate .md (or a format like .csv that Hugo can work with) from my database, but this seems like it might be easier for a database that updates frequently.

8organicbits 14 hours ago

As others have said, the plug-in system for Hugo is weaker than Jekyll. I think preprocessing is a viable alternative. I built something that generates markdown (especially the preambles) from a sqlite database. As a separate step, Hugo generates the web pages.

My code is messy, and probably not suitable for others, but it's here: https://github.com/ralexander-phi/rss-blogroll-network/blob/...

This uses a weekly web crawler run to populate the sqlite database (https://github.com/ralexander-phi/feed2pages-action/tree/mai...) and it generates a couple thousand pages which are then hosted on GitHub Pages (https://alexsci.com/rss-blogroll-network/).

Imustaskforhelp 1 day ago

Oh my god , I had written the exact same comment when seeing this for the first time , and then I scrolled to see your comment. Xd , definitely felt a bit of deja vu.

Yeh I am genuinely interested in this project. but I don't think I have the technical prowess to manage it , but I am going to try but I hope that others could somehow create it as well (since I have created nothing open source which people care about)

captn3m0 1 day ago

When I started this project, I seriously consider building it for Hugo, but the plugin system actively disallows such things, there’s no scope for dynamic plugins like these.

Alifatisk 1 day ago

This is so cool, bookmarked it. It sort of get the impression that Jekyll is slowing turning away from being a boring static site generator to something way interesting.

berkes 1 day ago

what is wrong with "being a boring static site generator"?

Alifatisk 15 hours ago

Nothing, I am just pointing out a transition with Jekyll.

jayknight 1 day ago

Now it's easier to generate static sites that aren't so boring!

ozim 1 day ago

I might be old fashioned but this seems like something static site generators shouldn’t do.

I can see how someone is grabbing it and regenerating content on data change and then if data changes often enough you are back to dynamic templating engines. Then nagging how awful it is for such use case and staring to build „new thing” that was already in RoR or else.

If you have data in a database most likely better to use RoR or whatever else you fancy.

But hey this might still be useful for some one off jobs.

captn3m0 1 day ago

If your dataset isn’t large, and your change cadence isn’t too fast - it works well.

I use it at BLR.today, where I curate events happening in Bangalore. The dataset updates roughly 4 times a day, and generating 100 pages 4 times a day is simpler than running an always on server.

The endgame for me would be to roll this into running a Jekyll-lite engine on the edge, against a SQlite database locally available to the edge compute.

amcaskill 1 day ago

We use a similar “baked data” approach with Duck DB + a static site generator in evidence

https://github.com/evidence-dev/evidence

captn3m0 1 day ago

This is quite interesting. I always wanted to build something similar using SteamPipe (which can pretend to be sqlite/postgres) alongside Querybook.

THe craziest production with this approach that I've seen is the crt.sh website, which builds a full dynamic website with postgres and sql: https://github.com/crtsh/certwatch_db/blob/master/fnc/web_ap...

mkasberg 1 day ago

I've been using Jekyll for my blog for nearly 10 years now. I also recently wrote a little plugin that uses SQLite, though I'm using it in a minimal way as a vector database rather than storing content in it right now.

https://www.mikekasberg.com/blog/2024/04/23/better-related-p...

lovasoa 1 day ago

This is a cool approach to integrating SQL with static site generation!

If you're into SQL-powered tooling, you might find something like https://sql-page.com interesting as a comparison point. It flips the model by letting you create dynamic web apps entirely in SQL, skipping the static file generatio step.

Anyways it's really nice to see new tooling removing a lot of the plumbing work needed to go from database to website.

audiodude 1 day ago

I have an understanding of static site generators, I've used several of them and even written my own. But I simply don't understand what this is or what it's doing. You're using a sqlite database during site generation time instead of .md files? You're definitely not querying the db at serve time right? I'm so confused...

victorbjorklund 1 day ago

Say you have a site about pokemons. Then you can have all the pokemons in a sql database (with a schema) instead of a bunch of MD files. Not all static sites are just blogposts.

audiodude 1 day ago

Right, looking again I understand better. Instead of a bunch of shaky YAML in a md file, you have properly structured data.

Isn't this what Gatsby tries to do with GraphQL?

victorbjorklund 17 hours ago

Exactly. Yea, similar to what Gatsby does.

audiodude 1 day ago

Maybe I'm just not understanding the novelty of the concept. No one has done this before? That's very hard to believe.

JoelMcCracken 15 hours ago

Who is to say it hasn’t been done before? OP did it and thought it was interesting and posted about it, and enough people also thought it was interesting so they upvoted.

raminf 1 day ago

I'm a little confused. The baked-data model is so you DON'T have to generate a thousand static pages. But this solution does exactly that.

Not complaining, mind you. My kid is trying to learn HTML/CSS/JS and wants to put together a read-only website with a database backend. I'll be pointing him this way as an ootion once he's far enough.

But it's still puzzling to link it to baked-data. Maybe I'm missing something.

captn3m0 1 day ago

> bundling a read-only copy of your data alongside the code for your application, as part of the same deployment

You can see https://github.com/captn3m0/northwind for example, which bundles the entire database alongside the code in the _db/northwind.db file. While Simon considers it primarily for dynamic apps, you have the ability to build PWAs and other interesting apps with the baked data pattern.

I'm building blr.today for example using this.

groby_b 1 day ago

But it's still not baked data, no? The whole point of baked data is that you don't generate static pages for every item in the data set.

Mind you, it's great to have a Jekyll plugin to do that from sqlite, it's just confusing when you call it baked data.

yboulkaid 1 day ago

Nice to see this project here! I've been using it with the Steam API to publish a list of the games I've been playing on my personal website: https://yboulkaid.com/games

I found that separating data from content on a Jekyll site is a really powerful way to have anything from photo galleries, blog entries, book lists, easily changeable menus etc...

chmaynard 1 day ago

This is welcome news! I already use spreadsheets and SQLite as part of my Jekyll workflow. I import spreadsheet data into SQLite, create some additional tables, and export them to CSV files. Jekyll ingests CSV files automatically and makes them available via Liquid. I look forward to learning more about this project.

sangeeth96 1 day ago

Although I don’t use Jekyll, loved this idea. I think using SQLite as the main data source would be a nice way to preserve content and play around with different static generators if they all had a plugin like this.

Also TIL about the baked data pattern, which I think is exactly what I needed for an upcoming project, so thanks for that. Though, I do align with one of the commenters here—this doesn’t seem like the same thing as the Baked data pattern in that Simon’s approach was about using a server rendered app with read-only data instead of generating a lot of static pages.

Nevertheless, this seems like a nice way for static generators to work with bit more complex data sources without dumbing them down to JSON/YAML.

andrew-jack 22 hours ago

I’m familiar with static site generators—I’ve used several and even built my own—but I’m confused about this approach. Are you using a SQLite database during site generation instead of .md files? You’re not querying the database at serve time, right?

jelled 1 day ago

Love the idea of preserving the simplicity of flat files for managing your content but dynamically loading them into a database to make sorting and filtering easier and more performant. I did something similar for my Laravel markdown blogging engine[0].

[0]: https://prezet.com/index

fredtalty5 18 hours ago

I appreciate the power of Jekyll’s Datafiles feature and how it integrates seamlessly with the Data Pages generator for creating dynamic content. However, I find Liquid challenging for data manipulation, which led me to get to know more about the Jekyll SQLite plugin that leverages SQL for more flexible data handling.

tajd 1 day ago

This is really cool. Whilst it's been fun mucking around with next.js etc (and arguably that's for a different purpose) for an out the box website Jekyll has proven itself time and again. Looking forwards to trying this out.

hk1337 1 day ago

This is interesting. I suppose if you wanted to setup your Jekyll data relationally, this would make it simpler to pull a simple list of the combined data.

I wonder if DuckDB would make it easier to do this same thing and use the existing Jekyll data files?

giancarlostoro 1 day ago

In wake of all the WP craziness a few weeks back, I wonder how long before someone builds a "best of both worlds" CMS to rival WordPress. "You get the nice admin UI, but it generates a static site" type of thing.

abhiyerra 1 day ago

Django-Distill I found to be pretty good for this. Use Django and the admin interface while being able to generate a static site.

WorldMaker 15 hours ago
hombre_fatal 1 day ago

There are a lot of those. Even Wordpress has a solution for that where it generates a static site.

berkes 22 hours ago

They have been around for over a decade. Ghost, strapi, grav, etc etc.

Many are far more user-friendly than WP. All are more secure, better performing, most easier to develop on¹ and several have far better fitting architectures and concepts for common use-cases.

Yet WP continues to churn along. It has it's "marketing" going for it. It has a familiar name, it is predictable (you know what cr*p and legacy you're signing up for, as a techie), and therefore it remains a popular choice. Which is a metric many people use to choose a tech stack on, so it's a flyweel.

You've probably not heard about any of the simple, secure, fast, static-file, build-on-CI, nice-UIs CMSs out there. They are there. You can use them to replace your WP. Yes, even if you have a staff of 20+ web-editors that have never even heard about something like "CI", "Git" or commandlines.

¹ I've been Drupal and WP developer from early 2000-s to early 2010-s. I've founded a few webhosting companies specialised in WP hosting. I've helped hundreds, maybe thousands of ppl with "WP stopped working last week, can you have a look". WP development is a ghetto full of dumpsterfires, with, if you know where to look, are very disciplined, avoid 95% of the "streets" (ecosystem), some gems.

anamexis 1 day ago

TinaCMS, fka Forestry is trying to do this. It's been a while since I've tried them out, but I love the idea.

https://tina.io/

sofixa 1 day ago

There are a few of those, I think the term used to describe them is "headless CMS".

nikeee 1 day ago

Is it possible to generate an SQLite DB from the site data and statically serve it, so sql.js can use it as a DB to provide something like search?

captn3m0 1 day ago

I'm mostly focusing on sqlite-as-the-data-source, but I can imagine something like this might be in scope for a project like lunr.js[1], which currently uses a large index served as JS/JSON. You can write a Jekyll plugin in a few lines of ruby code (with no gem management needed, just drop it in the _plugins directory), so the "write-index-to-a-sqlite-file" part shouldn't be hard to build either.

Is FTS5 included in the SQLite browser builds? The SQLite amalgamation includes it by default.

[1]: https://lunrjs.com/docs/index.html

ThatPlayer 1 day ago

Lunr.js isn't really being developed anymore from what I remember. I switched from Lunr.js to SQLite FTS in browser myself. I'm using sqlite-wasm-http [0], which uses range requests to only pull relevant pages of the static database as needed. Though if your search query is short enough, it'll probably pull the entire FTS table anyways.

[0] https://github.com/mmomtchev/sqlite-wasm-http

Imustaskforhelp 1 day ago

I wish if there was some hugo alternative to it , I guess I am going to go into the weeds of this project to replicate it in golang

arp242 1 day ago

Hugo more or less is "Jekyll, replicated in Go".

The core problem you will run into is that Go is a very static language, and Ruby is a very dynamic "free for all" language. Obviously up- and downsides to both approaches, but IMHO Ruby is clearly a much better fit for this sort of thing. On my own Jekyll website I do syntax highlighting with Vim's :TOhtml. I like how my Vim looks and I know how to tweak Vim syntax files. Kind of crazy I guess, but it works, and it's actually very little code. I would be hard to replicate this in Hugo, or any other Go-based site generator.

Similarly, doing a plugin like jekyll-sqlite will be hard. You can bake it in of course, but people experimenting with random stuff like this? Not going to happen with Go.

Not that Hugo can't be improved on by the way – I have generally found using Hugo to be highly complex and a forest of weird confusing errors that don't make much sense. But you will never really replicate Jekyll in Go.

I like Go. I have written tons of Go code over the last 10 years. But for some things it's just not a great fit, and this is one of those things.

captn3m0 1 day ago

Hugo got a WASM based plugin system, but real scripting plugins that would be needed for SQLite are still a feature request: https://github.com/gohugoio/hugo/issues/5510

arp242 20 hours ago

That would be an improvement, but it still wouldn't be equivalent to what you can do with Ruby and Jekyll. For example I do [1] so I don't need to put dates in my post names, which also fixes a bug [2] I encountered but was never fixed.

[1]: https://stackoverflow.com/a/68287682/660921

[2]: https://github.com/jekyll/jekyll/issues/8707

tonymet 1 day ago

Great project. I've long thought about creating a static-site transformer for wordpress sites via SQLLite-- in order to reduce costs and improve security.

The idea would be to feed the wordpress content into a sqlite DB and re-publish the entire site as a static site. Since wordpress comments have declined in usage, this should work well.

Publishing time would be a bit slower, but reads will be 100x faster and 10000x cheaper.

captn3m0 1 day ago

Wordpress now has (almost production ready) SQLite support, and there’s plugins like https://wordpress.org/plugins/simply-static/ so this might already be possible.

midzer 1 day ago

Wow, cool stuff!

captn3m0 1 day ago

Thanks!

jordanmorgan10 1 day ago

I have nothing to add other than Jekyll has been rock solid for me for, what, like a decade plus now?

I've ran swiftjectivec.com on it, and it's always been the perfect middle ground of taking care of the cruft I don't want to deal with, while allowing me to get my hands dirty and code when I want. Some of my favorite software ever.

rapnie 1 day ago

I have some old jekyll websites that I only very infrequently need to update. Each time something in the ruby / gem / bundler / jekyll chain setup is broken, and with some weird errors it is stackoverflow search time. Very time consuming, highly annoying. I postponed my last typo correction, to first make up my mind on whether to port to astro that I use currently.

diggan 1 day ago

> , to first make up my mind on whether to port to astro that I use currently.

Just sucks that eventually, that will happen with whatever you use after migrating too, the only difference is how long it takes.

I'm hoping NixOS or even just Nix for dev envs (or something similar) will help against this, so you end up with environments that just keep working.

bbkane 1 day ago

I actually switched my blog to use Zola (similar to Jekyll but packaged as a static binary instead of a Ruby gem) because I couldn't figure out how to build my site with Jekyll after a few years- it kept trying to compile C code?

Bear in mind this was 5 years ago and I had never used Ruby before, so probably a user error :)

Glad it's been so stable for you!

Tomte 1 day ago

> it kept trying to compile C code?

That‘s something that works incredibly well on Windows, better than I would have expected to.

The rubyinstaller.org people are shipping a fantastic installation. Under the hood it‘s msys2, but everything simply works out of the box.

captn3m0 1 day ago

I built endoflife.date with it, and it has been great. If I had to do it again, I might pick Mediawiki (or something similar) due to it being a community wiki more than a static site, but Jekyll hasn’t let us down yet.

freedomben 1 day ago

Really appreciate you building and maintaining it these years! endoflife.date has been an amazing resource ever since you put it up. This has been a great open source success story that I share with people to explain why open source can be great. It's also a great example of the power of crowd-sourcing data. I originally added Ruby[1], Fedora[2], and Alma Linux[3] to endoflife.date and it's rewarded me with years of ability to reference. Prior to this I kept my own notes for the projects I cared most about, but keeping those up to date was a PITA. The best ideas seem obvious in hindsight, and endoflife.date definitely seems obvious :-D

[1] https://github.com/endoflife-date/endoflife.date/commit/7dae...

[2] https://github.com/endoflife-date/endoflife.date/commit/ab16...

[3] https://github.com/endoflife-date/endoflife.date/commit/12a7...

captn3m0 1 day ago

Thanks a ton!

cannibalXxx 1 day ago

In this article I show you a project on how to develop a static blog in jekyll. https://chat-to.dev/post?id=296

charles_f 1 day ago

In this one I show how to host it in various stupid ways https://fev.al/posts/blog-infra/

Though the idea of using a database to store the content could make things even better. Maybe sprinkle some redundant postgres?

raminf 1 day ago

When you went from Jekyll to Kubernetes, it was must have been like when Ted Kaczynski first learned about battery-operated timed fuses.

Enjoyed the read.

Looking forward to when you chuck the whole thing into the bin and run Wordpress on your linux home server, fronted by a free Cloudflare zero trust tunnel.