Most of the work for this was done January to May, 2017. View it on github as well.
The goal for this project was to make something like Goodreads for Comics. That is, a site that aggregates data about comics, as there is no central resource for comic fans to go to log what they’ve read and to find new things to read, in a customized way.
The project in its current state is about 85% through the creation of the first version. I put it on the back burner because it’s at a nice stopping place and the next 6 months of work on it would be all web development, which isn’t the skill set I’m trying to cultivate or showcase.
Basic User Flow
The user arrives at the home screen, where the comics are displayed. This is written in bootstrap so it’s pretty responsive. Let’s say the user clicks on “The Flash”, second column, third row.
The user is then taken to the screen below, which is a display page for that particular comic (note – the descriptions are todo, or we’d have to give the user the option to fill that in since it wasn’t in the data source, but is probably necessary).
We can see the title of the comic, the issue name (“The Rag Doll Runs Wild!”), the issue number and the year. Comics are interesting in that it really takes title + issue name + issue number + year to uniquely identify them. Let’s say we click on “Flash – Barry Allen” in the characters section.
We are then taken to this page, the character page for The Flash (my personal favorite character). One of my goals for this project would be to allow users to follow characters, so that we could further customize their recommendations based on the characters they like, and also run some interesting clustering on characters, something that hasn’t really been done, due to the lack of user data for comics (the problem this site aims to solve).
That is all there is for now, but the project is ripe for new features to be implemented. The next low hanging fruit feature would be implementing users and shelves; there is a branch on github with this about 70% implemented.
What I did for this project
The core problem that I solved with this project is taking unstructured and unclean data sources (various sites on the internet), scraping a significant amount of data in a way that is repeatable and scalable, cleaning it, and then putting it into a database which I designed. I also built up a basic web app to allow users to see and interact with the data.
Skills This Project Highlights
- web scraping
- data munging
- database design
- ability to quickly learn new skills frameworks (web development)
I primarily used Python for the web scraping and data munging. I used a PostgreSQL database, and made the web app in Ruby on Rails.
Database Design Diagram
Note – the Meta Characters table is there to handle the following uniqueness problem:
For example, The Flash should be a single character. However, there have been 3 important distinct Flashes, Jay Garrick, Barry Allen, and Wally West. So, in the Meta Characters table there is one “The Flash”, and then that Meta Character can have n “Characters”, that all point there. I should note that it’s characters that appear in the comics (it would be incorrect to link a Wally West era (the 80’s) Flash comic to Jay Garrick (the 30’s)).
Also, it’s meta characters that have character pages and will be followed by users.
If you’d like to work with me on this project…
This project is on indefinite pause because moving forward is essentially a problem of web development, and I’d prefer to focus on cultivating other skills. If you are a web developer / web designer and have any experience with Ruby on Rails, hit me up and we can talk about moving this forward.
This is a fun side project and it solves (would solve, at least) an interesting problem. I would love to scale it up to the point where we have real users and I could work on the recommendation algorithms, but I have no intention of spending my nights and weekends writing CSS.
You can check out the code for this project on github.