{"id":1387,"date":"2018-10-03T09:48:37","date_gmt":"2018-10-03T07:48:37","guid":{"rendered":"https:\/\/www.smsapi.com\/blog\/?p=1387"},"modified":"2023-10-13T11:14:09","modified_gmt":"2023-10-13T09:14:09","slug":"using-puppeteer-to-find-missing-translations","status":"publish","type":"post","link":"https:\/\/www.smsapi.com\/blog\/using-puppeteer-to-find-missing-translations\/","title":{"rendered":"Using puppeteer to find missing translations"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">In the last quarter we\u2019ve been developing multi-language version of our website. In a short time, we have worked on additional 5 languages aside of 2 basic versions. More and more are on a horizon but there was doubt lingering if we\u2019re covering all our text in translation.<\/span><!--more--><\/p>\n<p><span style=\"font-weight: 400;\">Our website is built on both client and server side:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8211; simple views are rendered by node with translations provided by polyglot<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8211; components that are interactive are using Vue.js with translations given by vue-i18n<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8211; both are using the same language source base<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We&#8217;ve been logging errors and if we accessed page that was lacking translation in given language, the lacking phrase was also logged. This however, left us with few problems:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8211; \u00a0we could only find missing translations while visiting the page,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8211; we had no automated way of testing<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8211; multiple visits to subsite generated multiple log entries with the same error which made it difficult to read through<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this article I will focus on catching errors from Vue.js rendered components.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Since we had logging in place all I had to do is to trigger all possible errors to see which translations are missing. For that we needed a tool that would enter each site in every language and check console for errors. Since we have a sitemap ready the remaining piece was an automated browser that would enter it and scrape log errors. I decided use puppeteer which is a node.js library that provides api to control headless Chrome.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The created script has to:<\/span><\/p>\n<ol>\n<li><span style=\"font-weight: 400;\"> \u00a0<\/span> <span style=\"font-weight: 400;\">Download list of urls<\/span><\/li>\n<li><span style=\"font-weight: 400;\"> \u00a0<\/span> <span style=\"font-weight: 400;\">Visits each subsite \u00a0and save any logs with missing translations from Vue.js views<\/span><\/li>\n<li><span style=\"font-weight: 400;\"> \u00a0<\/span> <span style=\"font-weight: 400;\">Prints out all missing translation<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">We already have a sitemap available on the website. I\u2019m using axios to send a GET request that receives the sitemap. Since sitemap is in xml I\u2019m using <span class=\"pl-s\">xml2js<\/span> to parse it and extract urls.<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/4396ec3aca08dbab782e7830506abdac.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">I am using es7 async\/await keywords to wait for the result of axios get request. \u00a0The sitemap location depends on the environment the app is running in. You can see example sitemap for production at \u2013<\/span><a href=\"https:\/\/www.smsapi.com\/sitemap\"> <span style=\"font-weight: 400;\">https:\/\/www.smsapi.com\/sitemap<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Note: If you\u2019re using development environment that doesn\u2019t use valid ssl certificate make sure to disable TLS verification.<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/d7f6715530e23c819685046e2d67c38d.js\"><\/script><\/p>\n<ol start=\"2\">\n<li><span style=\"font-weight: 400;\"> \u00a0<\/span> <span style=\"font-weight: 400;\">Visit each subsite and save any logs with missing translations<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">I will create a function that will launch browser, add listener to console and save every log with missing translation warning and return it.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">I\u2019ll define an async function as most puppeteer actions are asynchronous. It will receive urls array that was fetched and parsed in the first step. First let\u2019s launch a browser.<br \/><\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/3ea67d0e3f0b57c1aa40c830580ca890.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">Since we\u2019re assigning it to a variable I can now reference it as it\u2019s an actual browser. By using this reference I can open pages, switch between the tabs and a lot more. Check documentation to see the possibilities.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As you see I\u2019m ignoring https errors since I don\u2019t use a valid ssl certificate in development (only self-signed).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Now I can open up a page which actually works as opening a new tab in the chrome browser.<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/f544f42d2e836f81305bd200c82372ca.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">Before I start crawling the site I want to add a listener on console. That will catch the logs with missing translations. Here\u2019s a sample console log:<\/span><\/p>\n<pre>[vue<span class=\"pl-k\">-<\/span>i18n] Value <span class=\"pl-k\">of<\/span> key <span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>Explore case studies and learn about Customers who are succeeding with SMSAPI:<span class=\"pl-pds\">'<\/span><\/span> is not a string<span class=\"pl-k\">!<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">As you can see it start with \u201c[vue-i18n] Value \u201d that I can catch with regex<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/9239726e16f1576149c006354ec72821.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">Now I\u2019ll add a listener that will push all unique logs that start with the phrase.<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/bfb5554a3c43f226c5e2558989ff2fb7.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">Now the fun part \u2013 actual webcrawling. We will loop through list of urls and visit each page. After we\u2019re done we close the browser.<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/3af865726cc43fdac68eb2c2aa692f0a.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">There\u2019s a catch though \u2013 it\u2019s all done asynchronously and as you can see I\u2019m not using Array.foreach. Since Javascript doesn\u2019t have a asyncForEach function we need to implement by ourself.<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/0bfb69a8ce72ab24941a0dde66f388e2.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">Now puppeteer will visit each site and save missing translations to array. We only need to return it at the end.<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/f2969460e32d8834fc23f8ed1551791f.js\"><\/script><\/p>\n<ol start=\"3\">\n<li><span style=\"font-weight: 400;\"> \u00a0<\/span> <span style=\"font-weight: 400;\">Print all missing translation<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">Since we have function needed to get urls and missing translations let\u2019s look at how we\u2019d actually use it.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In order to get urls and use our puppeteer webcrawler I need we need to do it inside asynchronous block or using try\/catch blocks. I will use the first approach.<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/84bc0262836c8ff7b08a91817216d6a8.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">Now I can print it out to console(which is stdout in case of node).<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/e5b2ac5fa382e099e8a85f723fa51842.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">Now last thing remaining is to kill node process.<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/ae0340d32c5829c7614a8cf2f65b63eb.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">The final code looks like this:<\/span><\/p>\n<p><script src=\"https:\/\/gist.github.com\/smsapi\/7baa7574ac2aff08b1cf3c58745ddfb1.js\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">Sample output:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">[vue-i18n] Value of key 'Explore case studies and learn about Customers who are succeeding with SMSAPI:' is not a string!<\/span>\n\n<span style=\"font-weight: 400;\">[vue-i18n] Value of key 'form.placeholder.email' is not a string!<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">The output can be piped into file so it can be viewed\/modified later node missing-translations-crawler.js &gt; report.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this article I wanted to focus on logging translation errors but the script was extended:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8211; sitemap location was parameterized using &#8211; <\/span><span style=\"font-weight: 400;\">commander<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8211; language was parameterized and can filtered from urls list to speed up the process<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8211; each language is printed out separately in the raport<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Creating webcrawler with puppeteer was a pleasure and entire process (crawling through 200 pages) takes less than 2 minutes. It\u2019s extremely useful for development purposes as it gives us more confidence when adding new language or making changes. We\u2019re currently using it in a jenkins job so we always have a current raport if any errors occurred.<\/span><\/p>\n\n\n<p class=\"wp-block-paragraph\">Author: J\u00f3zef Piecyk<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the last quarter we\u2019ve been developing multi-language version of our website. In a short time, we have worked on additional 5 languages aside of 2 basic versions. More and more are on a horizon but there was doubt lingering if we\u2019re covering all our text in translation.<\/p>\n","protected":false},"author":21,"featured_media":1478,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[20],"class_list":["post-1387","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorial","tag-english"],"_links":{"self":[{"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/posts\/1387","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/comments?post=1387"}],"version-history":[{"count":53,"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/posts\/1387\/revisions"}],"predecessor-version":[{"id":5745,"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/posts\/1387\/revisions\/5745"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/media\/1478"}],"wp:attachment":[{"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/media?parent=1387"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/categories?post=1387"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.smsapi.com\/blog\/wp-json\/wp\/v2\/tags?post=1387"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}