Crypto Mining Monitoring
The goal of this project is to extract/scrape and combine data from different websites to a single platform. This data is coming from my crypto mining rig. The data includes: GPU specs (wattage, temperature, hashrate etc.), pool stats (hashrate on pool side, valid/invalid shares, amount of mined coin) and profit/loss and income data from Koinly.
Why you might ask? Because currently it is frustrating for me to open up these different applications to see what is happening overall and there is a lot of data that I don’t need most of the time, so the point and goal of this project is to extract only the needed pieces of data and have them show up in one place instead of multiple applications and webpages. A bonus goal would be to automate messages based on this extracted data, for example if fan speed is 0% on one of the GPUs or the GPU temperature is above a set point, this would generate me an email. These features are not necessary but will be included if there is enough time. Deadline for the project is 13.12.2022.
The plan is to either use already existing scraping tools like Octoparse to scrape only the useful bits of data from these different sources and then extract it to a platform like Power BI or a website. Another way is to dive into Python (my coding background is on a very basic level) and use it to scrape data from the websites or APIs via Python requests. Pulling data from APIs would be the most scalable and preferable way, but we’ll see how things work out as it could turn out difficult in some cases.
Any code used in this project can be found on my github page.
Here are some examples from the different data sources:
HiveOS Web GUI
HiveOS is the OS I run on my mining rigs and the software included in it has a web GUI to control your GPUs and showcase specifications. The end goal is to extract the GPU specific data including GPU name, hashrate, temperature, fan speed and wattage as the rest is not useful information 95% of the time.
Herominers – Mining Pool
This is the mining pool I currently use. The data I want to extract from it is current and average 24 hour hashrates, pending balance and total amount of mined coin.
Koinly – Crypto Finances
From Koinly I only want to extract the income data. On top of this I have an excel file related to my hardware purchases from which I would like to extract the money spent on hardware this year and then see income minus hardware costs.
Legal side of things
Scraping straight from a website can be legal or not depending on if the website allows scraping. To see if a website disallows scraping, you can add ”robots.txt” at the end of the URL to get a list of disallowed things on the said website. For example, facebook disallows all scraping and this is what the robots.txt file looks like:
However most websites seem to allow scraping. At least in this case we seem to be in the clear as you can see below, nothing is disallowed:
For Koinly I did not find a robots.txt file but after reading their terms, I did not find anything that would be illegal as I am using my own account’s information here. Scraping another user’s data without permission would be illegal.
Testing out data scraping – Octoparse
I’m starting my testing phase by trying to scrape data from my mining pool webpage (Herominers) with Octoparse, because the free version has enough features as seen below:
Running the scraping locally is enough for proof of concept and I should not need more than 10 tasks to be ran as I don’t have more than a few sources of data.
The goal here is to scrape my hashrate, shares, earnings and worker related data. This is how it looks like on the actual Herominers (my mining pool) webpage:
After downloading Octoparse and trying to get it to read my information from the Herominers webpage the first problem I ran into was that it does not save up my information that was already typed in to the website (my crypto wallet address), thus leaving this section empty and not giving any of the data I was talking about above:
I quickly figured out that I can make the software type in text and click on objects on the webpage. Now it shows my personal information based on my crypto wallet address, and does this every time before I want to extract data:
Now it’s time to figure out how to exactly scrape only the hashrate, shares, earnings and worker related data as there is a lot of unnecessary information like the block rewards and graphs. This needs to be scraped in a form that I can easily use in excel form since I plan on feeding it to Power BI later on in the project.
After fiddling with the software for a bit I managed to create this simple workflow to enter my walled address, click on the element to commit a search based on it and then scrape these parts of the page as text:
Next I exported this data to an excel sheet for proof of concept:
I also saved the whole workload as a ”Task” in Octoparse so that I can use it later on by just clicking ”run” from the created task, and this is what that looks like:
As you can see it only takes about three seconds to complete the set scraping task so running it locally is no problem so far.
Visualizing the scraped data:
Next up I downloaded Power BI and started playing around with it to see how it works and in what way I could best visualize this small batch of data for proof of concept.
This immediately started with problems as I noticed that having each line of data on it’s own field/column made it unnecessarily hard to visualize the data as you’d need to modify the excel file inside Power BI to have it all under same column. This is how it looks by default now:
It seems too much of a pain to merge the different columns to one through Power BI as I am not an experienced user so I change my approach to just doing the data extraction in a different way from Octoparse. The software in gerenal felt like it was not really meant for what I’m looking for here, or at least a good solution for it.
Testing – Python requests – Herominers
After some trial and error I came to the conclusion that Octoparse and Power BI were hard to work with and scraping this way for my purposes felt really clumsy as I wasn’t familiar with these applications. So now it’s time for some Python.
To start out with, I read Karvinen’s post about Python and APIs. I started by trying to extract data from Herominer’s API but the problem was that my crypto wallet address needed to be somehow included beforehand so that I could extract data specific to me. I watched this video from Exordium about how to log in to any website with Python requests. I figured out that I can get the API URL with my address on it by opening the inspect element on Google, going to the network tab and typing in my address to the website. Then this happens:
Now I can use this ”Request URL” that popped up from typing in my address on the page. Also it is important to notice that the content-type is JSON.
This is what it looks like when opened on a browser:
For coding I am using (Windows 10) Visual Studio Code with Python and the requests library.
With this I get keys from the API and can use them to find out the pieces of data like current hashrate which is among the things that I am looking for. Leaving the ”herominers_API_keys.keys()” empty it gives me all the keys it finds. Next I search what is inside the ”miner” key:
This found ”hashrate”, ”roundscore” and ”roundHashes” and the first one seems to match my current hashrate reported by the pool, the pool just rounds it to Gigahashes instead of just hashes. To do the same, I found this forum from stackoverflow and used this math function:
I applied this to my code and this is the end result for using the Herominers API to print out my current hashrate, seems to work pretty good so far:
However, there seems to be missing some of the stats seen on the website from this address, for example I can not find the 24 hour average hashrate from any of the keys. To solve this issue, I went back to inspect the website.
After going through the network event previews, I found that the address ”https://ergo.herominers.com/api/stats_address?address=9fbTRjsurmNp14mABUG3APak7TrxnbWnAuHv9NtW31oKPx2VgNV&recentBlocksAmount=20&longpoll=true” has all the data I’m looking for. The different average hashrates, coin payment amounts, rig specific hashrates and submitted shares.
Same view from when applied to Visual Studio:
Applying what I learned from earlier to this, I managed to print out all the data from Herominers which I originally wanted. I did have to modify and round up the numbers since even the amount of coin was in a weird form. My current pending balance at the time was 0.829 ERGO according to the website, but in the API the number was shown as ”829051968”.
Here are the results:
After some polishing and researching of ways to extract this script, I came to the conclusion that I’d try something that sounds simple at first. This meaning that I slightly modified to code so that it types the output to a .txt file which I can show through a HTML file. While doing this conversion, I found out that the number that the API uses to tell workers (the rig number so to say) apart changes every time it is refreshed.
This means that some times the 0 labels as ”Rig_1” and other times as ”Rig_2”. I managed to fix this by giving extra variables to the rig names like so:
Now the output always shows correct hashrate based on the rig’s name. The rest of the code and the output on .txt look like this now:
Testing – Python requests – HiveOS
Now that I’m done with Herominers, it’s time to try and get my hands on the data inside HiveOS’ API. Thankfully they have it all publicly available and well documented on their SwaggerHub page. However, after trying to replicate what I did with Herominers I ran into a new problem: even with correct address/endpoint, I would get no output from the json. This is because the API is only accessable through your own personal API key.
The key was easy to get, I just had to go to my account settings and generate it through the website while being logged in to my account. However, I had no idea how to apply the key to my code so I was still stuck. After some digging I found WayScript’s YouTube -video about how to use Python requests with API keys. Shortly after I came out with a test script to get out a json of my worker 1 data inside HiveOS. For example, this is my ”worker 1” aka Rig 1 in the GUI:
And this is what it looks like from the API through my script (I cannot show all of it because it includes sensitive information):
Oh and the way how I found the right path to the worker settings was through the SwaggerHub HiveOS API page I mentioned before, it has everything documented like this:
But we can see, now the output gives us a lot of GPU specs including power consumption, name, temperatures, card specific hashrates and so on. The GPUs are labeled with index numbers starting from 0, so getting the right data for each should be quite straight forward and similar to what I did with Herominers.
Notice that in the code I’m importing my API key and other sensitive variables like my GPU farm and worker numbers from another file (”credentials.py”), so I’m not leaking my personal information but can still share the main code publicly.
This however gives me the raw data without dividing things into keywords like what I did with Herominers. Fixing it did not require much, here is my solution:
Now I have a nice and clear list of keywords to go through to get the GPU specific data for ”worker 1”. After going through the keys, I found that all the data I need is located under ”gpu_stats” under which is an index number to define which GPU’s stats we are looking and and then under that index is all the different hashrate, temperature etc. data . This is the output I got after some testing for two GPUs:
Now the output includes GPU name, hashrate, core temperature, memory temperature, power usage and fan speed % (in raw numbers for now). The GPU names I type in myself as the API does not provide the models, only manufacturer name (”MSI”, ASUS” etc.) so it makes more sense to just write the whole line by myself.
After applying the hashrate format function from before and adding other symbols for the numbers, now the output looks how I’d like it to; easy to read and it can be copy pasted (just need to change the gpu/worker number indicators) for all GPUs and workers for scalability:
The way how I printed the lists in separate lines was found from this forum post from StackOverflow.
As we can see, now this GUI view is ”perfectly” copied and the unnecessary tuning related numbers and other data are left out:
For worker 2, I can pretty much just copy everything GPU by GPU except for the 1660 Supers it needs to be noted that the memory temperatures from the API requests will show as zeros since these GPUs don’t have sensors for the memory temperature, or at least they are not shown by the HiveOS software. However I will still output these zeros so the ”lines match” with the other GPUs so the overall view is easier to read. Here’s an example for what I mean (and a screenshot of the worker 2 GUI):
After adding the (same) code for all GPU indexes, this is what the output looks like now:
Testing out extracting the data
First things that popped into my mind for how to start testing out extracting the data we have got through the APIs were things like Huginn or a static website, goal being that the script is being ran automatically. I started off by making a very simple HTML file and made the .txt file an object shown on the site:
This works, but the problem is that I need to manually run the python file to update the results. I quickly fount out that PyScript would be an easy way to include Python inside HTML, but unfortunately it does not support the requests library, and the alternatives like Pyodide/pyfetch don’t seem too tempting to me, unless I find no other way of doing this, because I would need to redo pretty much everything.
For now the solution is going to be printing out the output of the different files to text files and then having these on the HTML page like I already showcased. The webpage will be ran from my Raspberry Pi 4, which means I’ll be using a linux operating system that hosts a website and then runs the python files automatically with Crontab. For hosting, I will likely use Apache2 as it just makes sense since I’m using HTML.
My HiveOS related python file had to be slightly modified so that it also writes the output to a text file and this is how I did it:
Hosting and automating
I installed Ubuntu desktop (22.04) to my Raspberry Pi 4 and installed and configured crontab to run the python scripts once every minute to start out with (with the help of this guide from CherryServers. I also installed Apache2 and copied the python and index files to the Raspberry. I did not make any modifications to the apache2 configuration, only renamed the out of the box index file to save it and then copied my own index.html to the /var/www/html folder. However, my HTML file relied on reading the text files and now it did not work as the text files were not in the same place. I tried to modify the HTML file so that I simply added the file path to a folder on the desktop which contains the text files. For some reason this did not work and I got stuck for a while as I’ve never really used Apache2 before.
After getting frustrated with this, I decided to try to get the crontab to help me with this. I soon came up with the idea of just simply copying the text files (which the python scripts create to the same desktop folder that they are in) from desktop to the /var/www/html folder where the index.html for Apache2 is located at. So now I have crontab configured to run the python files on my desktop, these scripts then create the text files to the desktop, then the crontab copies these text files to the correct Apache2 folder and the index.html in Apache2 reads them and now things work in a local configuration, which is being updated once a minute as that’s how often I set the crontab to run. It’s worth noting that the crontab had to be run in sudo (sudo crontab -e) as otherwise it won’t be able to copy the files to the Apache2 folder.
Here are some pictures of the Raspberry setup and configuration:
I ran into a small problem with the output on the website showing all items with ”°C” in a weird way because I had not specified the character set as UTF-8. The fix was simple, just had to add the ”charset=”UTF-8″” into my HTML code.
Now I’m also able to load the Apache2 hosted page from another local computer, like my Windows desktop. This happens by typing in the Raspberry’s IP address to the search bar (which I have hidden in the picture):
First thing I want to improve in terms of front end is how the GPU ”tables” from the hiveos API look like. First I was trying to use HTML and CSS but I have very little experience with them so I ended up trying to solve things inside Python instead. After some digging I found out that Python has an already made project called ”Tabulate” which makes nice looking tables from item lists, automatically with tons of different looking options. Here is what the code and output look like when applied to my script:
One problem occurred when I added the styling (”tablefmt”), running the code afterwards gave the following error: ”UnicodeEncodeError ’charmap’ codec can’t encode characters”. This was luckily easy to solve as it was essentially the same problem I had with my HTML after moving it to Apache2: I had to add UTF-8 encoding to my python code. It was really simple after reading this thread from StackOverflow, where someone had the same problem.
Next I added some very basic style elements to the website to test what things look like, and it seems that we are 20 years back in time now… but it ”works”!
After studying some HTML and CSS from W3Schools, I ended up using the base of their flexbox exercise for my website to save some time since the layout fits my needs anyway. This is what the front end looks like for now:
Next up I opened up some ports on the UFW (the firewall on my raspberry) so I could port forward traffic to the apache2 server from my router. This means that now my website can be accessed from outside my LAN through my public IP address. For a more in depth tutorial on how to do this, you can follow the same steps as I did from Bitcoin Daytrader’s YouTube video. The port forwarding part is likely different, since all routers have their own and somewhat different GUIs.
After getting this to work I bought myself a domain and now you can access my website through this link: http://mikohirvela.fi/.
Adding more data to my ”dashboard”
I did not find any API documentation from Koinly and it seemed too much of a hassle to break things down to get one number so I decided to drop it for now. Instead I came up with the idea of adding a list of prices of the cryptos I personally follow. CoinMarketCap has good API documentation so I decided to try and dive into it for BTC, ETH, ERG and DOT prices and 24h price changes as those interest me. I found a more in detail guide from Chad Thackray’s YouTube video and came up with this solution and pretty much without problems:
Now it outputs the prices and price changes to a text file like this:
The only issue is that the API requests are limited to 333 credits per day or 10 000 credits per month for free, but it should not be a big problem as running the script once only uses 4 credits at a time which means I can run it 83 times a day for free. This is how I added a cronjob for it to run the script once and hour and copy the text file to the Apache2 folder a minute afterwards (so there is enough time for the script to run to produce the updated text file before moving the file):
Updated look of the website with this new script:
Next up I wanted to add a script that tells me the current profitability of mining Ergo with my hashrate. To do this I need to find out what the current profitability is after taking the current network difficulty into consideration. This however is already solved for me by a website called ”whattomine”. I simply typed in my Ergo hashrate and the website creates a JSON file (which updates automatically) that tells how much Ergo I’m going to get based on the current network difficulty and so on. I made this python script to call the correct key from this JSON file to extract the estimated rewards from it (I used pprint to print out the keys in a list style so it’s easier to work with):
Now we have a working solution for extracting the current estimated amount of Ergo per 24 hours of mining with my hashrate based on the current network difficulty. I fused this to my coinmarketcap script so I get the output easily from the same same python file:
This is what it looks like after integrating it to my website:
Afterwards I added more detailed information to explain the numbers and tables shown on the dashboard and hid the text into ”tooltips” so that the dashboard is clean looking but for those who want to dive deeper there is the option to read the tooltips that explain what the numbers are:
Fault Tolerant Code
Now that I have all the data I want to be seen on the dashboard, it’s time to try and add some fault tolerance. What I mean by this is that currently if one of my rigs is offline, the website will just show whatever the latest update on the text file was. So what I need to do is to modify my code so that it modifies the information inside the generated text file even if the worker is offline and the script thus cannot create new data (since the JSON keywords will change according to the worker being online or offline).
At first I tried the hard route through If statements but it was not ideal as I needed to look for new keywords and this was making the code more complex than it needed to be. The final solution was as simple as discovering that python has a built in feature to handle errors (if the worker is offline, my current script will only generate a ”keyerror” as the keyword for the gpu stats is not there anymore), called ”Try Except”. With this I only need to insert my existing code inside ”try:” and afterwards add the ”except:” option which in case of an error generates a different output inside the text file. This is what it looks like in action when worker 1 is turned off:
Afterwards I also added more fault indicators to the Total farm stats, so that it tells clearly if there are GPUs offline or if the overall hashrate is lower than expected. This is how it works when worker 1 is turned off:
Last step on fault tolerance was to apply the same logic to the rest of my scripts. Here’s an example of what it looks like for the Herominers script:
Before this, the output would miss almost all of the information even if only one thing caused an error (like Rig 1 being offline), so the script would not type in current info in case of an error. Now it works regardless of what error occurs.
At first I thought about adding automated emails for when my Rigs go offline for example, but that sounds too old fashioned. Instead I’ll try to make my own bot on Discord that sends me notifications on my own discord server. To start with, I created a basic bot through Indently’s YouTube tutorial. Here’s what it looks like to log in the bot:
As you can see, the script makes the bot log in via a personal key (token). The bot itself was created before this and was done through the Discord developer portal. After a lot of struggle I realized that this was the wrong approach to what I’m trying to achieve. The bot seems to be only able to react to things that are done through the discord channel itself, so integrating my scripts to it seemed impossible with my skills. Thankfully I found out that Discord has webhooks with a python library. Now things are starting to work like they should and the solution requires close to no code. Here’s a sample of how it works:
After implementing this to my Hiveos script, this is what happens when my Rig 1 is turned off:
So in the end this was as simple as creating a Discord server (which is free), creating a webhook URL from the server’s channel settings and adding 4 (well, basically 2 but just two times) lines of code to my original script. Pretty cool stuff. However, this led me to notice that at random times it took longer than a minute for the Raspberry to run the Hiveos script and it gave false alarms so I had to change the cronjob intervals to every 5 minutes instead of every minute to reduce load on the Raspberry.
I wanted to see how much the raspberry server can handle as I have pretty much no idea. To do this I used the apache benchmark that comes with apache2 installation (from this tutorial), and here’s a quick look about how it works:
I did the first test with -n 100 and -c 10 and the Raspberry handled it with ease:
As we can see, all requests are handled really quickly, the slowest request time being 297 ms. Now if I bump up the ”amount of users” (-c) from 10 to 100, the performance starts to slow down quite a bit, but nothing yet breaks:
After upping the numbers to -n 2000 and -c 1000, I started getting failed requests and if I tried to open the page on a browser it would not load at all:
With -n 2000 and -c 600 it seemed to be just below the breaking point as the page would still load but 2 requests failed. Out of all the testing the CPU load only reached around 30% so the bottleneck is something else and if I had to guess it would be the apache2 configuration itself. However the conclusion is that the Pi is more than plenty to run this type of a webpage even to hundreds of users.
Improving readability and modifying tables
Here’s the modified Python script and JS for worker 1 table:
The JS code I simply found and copied from w3schools’ tutorial. Now the output of the table on the website looks like this (and scales according to viewport):
Now I can modify the looks and content of each bit of added data as I like. But before I continue modifying the table itself, I will add more specific exception errors. For key errors (the API does not find the JSON keys) I’ll simply add an ”OFFLINE!” text with bright red as this always indicates that the rig is not seen by the API and this almost always means that it is indeed offline and not mining. On top of this I will add exception for IndexError, as then I can get notified on the page if for some reason I have wrong GPU numbers on the script for example:
And here’s an example of what it now looks like if a rig is offline:
Next I made it so that if the GPU core and memory temperatures exceed certain values or are under certain values, they change color. Red means that the value is being too much, yellow means that it’s close to the threshold of being too much and green means that everything is as expe
Then I applied the same logic to the rest of the hiveos script and added hover option to see GPU model from it’s index number:
Next up I made the same kind of modifications to the total hiveos stats so that if all GPUs are online, ”GPUs online” is green and if not, it turns yellow to indicate a warning. Same idea with the ”GPUs offline” but it has no color indication except red, which comes in play if any GPUs are offline. This indicates strongly that something is not working correctly:
After this I set similar settings for the cryptocurrency price table. Green indicates more than 1 % positive, orange indicates between 1 % and -1 % and red indicates more (technically less) than -1 %:
And here’s a picture of what the whole dashboard looks like after these changes: