Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • fact/data/FACT_jobs
1 result
Show changes
Commits on Source (20)
# FACT Employment Official Statistics Changelog
All notable changes to this project will be documented in this file.
Release mentioned here correspond to docker registry entries and use semantic versioning as in 'MAJOR.MINOR.PATCH'.
## Change entries
Added: For new features that have been added.
Changed: For changes in existing functionality.
Deprecated: For once-stable features removed in upcoming releases.
Removed: For features removed in this release.
Fixed: For any bug fixes.
Security: For vulnerabilities.
## [0.2.0] - ongoing
Added:
- license
- changelog
- contributing
Changed
- added xxx BLS datasets
- added xxx EUROSTAT datasets
Fixed
- database tables partitioning on extended state/country ISO3 codes
## [0.1.0] - 2023-10-19
Added:
- first full structure and dataset creation
\ No newline at end of file
# Contributing to this project
Thank you for your interest in contributing to FACT! We welcome contributions from everyone and value your input and efforts to improve this project. Below, you will find guidelines and instructions on how to contribute effectively and ensure a smooth collaboration process.
## Ways to Contribute
There are many ways to contribute to our project:
- **Reporting bugs**: File issues for any bugs you encounter.
- **Feature suggestions**: Suggest new features or improvements to existing features.
- **Documentation**: Help improve the project documentation.
- **Code contributions**: Contribute bug fixes or implement new features.
## Submitting Contributions
Please follow these steps to submit your contributions:
1. **Create a branch** for your changes.
2. **Make your changes**: Follow our coding guidelines and write clean, testable Go code.
3. **Write tests**: Add or update tests as necessary for your changes.
4. **Run all tests** to ensure nothing else was accidentally broken.
5. **Submit a pull request**:
- Push your branch to Gitlab and open a pull request against the main branch.
- Provide a clear description of the changes and any relevant issue numbers.
## Style Guide and Coding Conventions
- Use clear and descriptive variable names.
- Document all public functions and packages.
- Keep functions focused and well-organized within packages.
- Update changelog accordingly
## Review Process
Our team will review all pull requests as soon as possible. During the review, we may ask for additional changes or clarification. Pull requests must be approved by at least one maintainer before merging.
Contributions that include new features or substantial changes should be discussed in the issue tracker or discussions before starting work.
Thank you again for considering contributing to FACT. We look forward to your contributions and are excited to see what we can achieve together!
FROM mariadb:11.4
# copy directories with scripts
COPY inputs/update_sys.sh /root/inputs/update_sys.sh
RUN chmod a+x /root/inputs/update_sys.sh
COPY dump /docker-entrypoint-initdb.d
# update system for security reasons
RUN $HOME/inputs/update_sys.sh
FACT_jobs_db.png

41.5 KiB

This diff is collapsed.
# FACT Employment Official Statistics
Official labor statistics about the US and European countries. Data sources are the Bureau of Labor Statistics (BLS) and the Census in the US and EUROSTAT in Europe. They vary in frequency of observation (either annually or quarterly), geographical scope, and industrial details.
---
## Use
1. Connect to the GitLab container registry and pull the image:
> docker pull registry.git.nilu.no/fact/data/fact_jobs:latest
2. Run this data service:
> docker run -t -i --name fact\_jobs -e MARIADB\_DATABASE=FACT_jobs -e MYSQL\_ROOT\_PASSWORD=devops -p 3308:3306 -d registry.git.nilu.no/fact/data/fact\_jobs:0.1
The container makes available a MariaDB instance with the full database on employment. It is reachable on port 3308 of the localhost and the root password is 'devops'.
---
## Specifications
### Data size
```
+--------+-----------+----------+
| Table | Size (MB) | Rows (#) |
+--------+-----------+----------+
| LFS | 2.35 | 43278 |
| LODES8 | 2323.16 | 41592806 |
| QCEW | 6916.19 | 59684807 |
| REA | 22.52 | 418223 |
| SBS | 1.54 | 25938 |
+--------+-----------+----------+
```
### Data structure
![Database tables](FACT_jobs_db.png "Title")
### Data fields
#### LODES8 - LEHD Origin-Destination Employment Statistics, the U.S. Census
Data refers to **jobs**.
| Column | Description |
|---|---|
|GeoID| FIPS block id, 15 chars: STATE+COUNTY+TRACT+BLOCK |
|Year| 2002-2021|
|CNS01 | Number of jobs in NAICS sector 11 (Agriculture, Forestry, Fishing and Hunting)|
| CNS02 | Number of jobs in NAICS sector 21 (Mining, Quarrying, and Oil and Gas Extraction)|
| CNS03 | Number of jobs in NAICS sector 22 (Utilities)|
| CNS04 | Number of jobs in NAICS sector 23 (Construction)|
| CNS05 | Number of jobs in NAICS sector 31-33 (Manufacturing)|
| CNS06 | Number of jobs in NAICS sector 42 (Wholesale Trade)|
| CNS07 | Number of jobs in NAICS sector 44-45 (Retail Trade)|
| CNS08 | Number of jobs in NAICS sector 48-49 (Transportation and Warehousing)|
| CNS09 | Number of jobs in NAICS sector 51 (Information)|
| CNS10 | Number of jobs in NAICS sector 52 (Finance and Insurance)|
| CNS11 | Number of jobs in NAICS sector 53 (Real Estate and Rental and Leasing)|
| CNS12 | Number of jobs in NAICS sector 54 (Professional, Scientific, and Technical Services)|
| CNS13 | Number of jobs in NAICS sector 55 (Management of Companies and Enterprises)|
| CNS14 | Number of jobs in NAICS sector 56 (Administrative and Support and Waste Management and Remediation Services)|
| CNS15 | Number of jobs in NAICS sector 61 (Educational Services)|
| CNS16 | Number of jobs in NAICS sector 62 (Health Care and Social Assistance)|
| CNS17 | Number of jobs in NAICS sector 71 (Arts, Entertainment, and Recreation)|
| CNS18 | Number of jobs in NAICS sector 72 (Accommodation and Food Services)|
| CNS19 | Number of jobs in NAICS sector 81 (Other Services [except Public Administration])|
| CNS20 | Number of jobs in NAICS sector 92 (Public Administration)|
#### QCEW - Quarterly Census of Employment and Wages, the U.S. Bureau of Labor Statistics
| Column | Description |
|---|---|
|GeoID| FIPS: US, STATE, COUNTY |
|Year| 2000-2023|
|Naics| Industry codes|
|Agglvl_code |14 National, by NAICS Sector; 15 National, by NAICS 3-digit; 16 National, by NAICS 4-digit; 17 National, by NAICS 5-digit; 18 National, by NAICS 6-digit; 54 Statewide, NAICS Sector; 55 Statewide, NAICS 3-digit; 56 Statewide, NAICS 4-digit; 57 Statewide, NAICS 5-digit; 58 Statewide, NAICS 6-digit; 74 County, NAICS Sector; 75 County, NAICS 3-digit; 76 County, NAICS 4-digit; 77 County, NAICS 5-digit; 78 County, NAICS 6-digit |
|Q1_establishments| Number of establishments in quarter 1|
|Q1_disclosure| Percentage of establishments with disclosed information in quarter 1|
|Q1_avg_weekly_wage| Average weekly wage in quarter 1|
|Jan_jobs| Number of jobs in January|
|...|...|
#### REA - Regional Economic Accounts, EUROSTAT
Data refers to **employed persons**.
| Column | Description |
|---|---|
|GeoID| NUTS 0 - 3 |
|Year| 1995-2022|
|Nace | Reduced level 1 NACE, Rev. 2|
|EmpTh | Thousands of Employed Persons|
Source: [doi:10.2908/NAMA_10R_3EMPERS](https://doi.org/10.2908/NAMA_10R_3EMPERS)
#### LFS - Labour Force Survey, EUROSTAT
Data refers to people of any sex and any age > 15 years old who are **employed persons**.
| Column | Description |
|---|---|
|GeoID| NUTS 0 - 2 char country code |
|Year| 2008-2023|
|Nace | Level 2 NACE, Rev. 2|
|EmpTh_Q1 | Thousands of Employed Persons in Quarter 1|
|EmpTh_Q2 | Thousands of Employed Persons in Quarter 2|
|EmpTh_Q3 | Thousands of Employed Persons in Quarter 3|
|EmpTh_Q4 | Thousands of Employed Persons in Quarter 4|
Source: [doi:10.2908/LFSQ_EGAN22D](https://doi.org/10.2908/LFSQ_EGAN22D)
#### SBS - Structural Business Statistics, EUROSTAT
Data include only G data source below for now and hence are limited to 2021.
| Column | Description |
|---|---|
|GeoID| NUTS 0 - 2 char country code |
|Year| 2008-2022|
|Nace | Level 4 NACE, Rev. 2|
|Enterprises| Number of enterprises |
|Employment| Persons employed - number |
|LaborCost| Unit labor cost per person employed, thousand euro|
Source: [doi:10.2908/SBS_OVW_ACT](https://doi.org/10.2908/SBS_OVW_ACT)
---
## Notes
Data is selected, downloaded, and reorganized from multiple data sources.
### LODES
The U.S. Census publishes regularly the [Longitudinal Employer-Household Dynamics - LEHD](https://lehd.ces.census.gov/data/) and within that the [LEHD Origin-Destination Employment Statistics - LODES](https://lehd.ces.census.gov/data/lodes/).
The data considered here is the Workplace Area Characteristics (WAC) of LODES8, where the version number indicates the TIGER geographical boundary specification adopted (2020 Census blocks). Details are at https://lehd.ces.census.gov/data/lodes/LODES8/LODESTechDoc8.0.pdf.
The overall period is 2002-2021 but not all states are represented in all periods. All 51 states are available for only the period 2011-2016.
### BLS
BLS data could not be downloaded though API because of limitations on daily number of combination of series and time periods. QCEW data, however, is published also in tabular formats https://www.bls.gov/cew/downloadable-data-files.htm.
Percentage of disclosure refers to number of establishments and is much influenced by the fact that data is reported at different ownership codes and without totals. The more details means less disclosure, of course.
```
+-------------+--------------------------------+
| Agglvl_code | TRUNCATE(AVG(Q1_disclosure),4) |
+-------------+--------------------------------+
| 14 | 0.9999 |
| 15 | 0.9990 |
| 16 | 0.9992 |
| 17 | 0.9980 |
| 18 | 0.9975 |
| 54 | 0.9920 |
| 55 | 0.9536 |
| 56 | 0.8972 |
| 57 | 0.8534 |
| 58 | 0.7892 |
| 74 | 0.7723 |
| 75 | 0.6351 |
| 76 | 0.5094 |
| 77 | 0.4270 |
| 78 | 0.3971 |
+-------------+--------------------------------+
```
### EUROSTAT
#### REA
**A**. *Employment (thousand persons) by NUTS 3 regions (Level 1 NACE)*
https://ec.europa.eu/eurostat/databrowser/product/view/nama_10r_3empers
**B**. *Compensation of employees by NUTS 2 regions (Level 1 NACE)*
https://ec.europa.eu/eurostat/databrowser/product/view/nama_10r_2coe
**C**. *Employment (thousand hours worked) by NUTS 2 regions (Level 1 NACE)*
https://ec.europa.eu/eurostat/databrowser/product/view/nama_10r_2emhrw
#### LFS
**D**. *Employment by sex, age and detailed economic activity (from 2008 onwards, NACE Rev. 2 two digit level) - 1 000*
https://ec.europa.eu/eurostat/databrowser/product/view/lfsq_egan22d
**E**. *Employment by sex, age, economic activity and NUTS 2 regions (NACE Rev. 2 level 1) (1 000)*
https://ec.europa.eu/eurostat/databrowser/product/view/lfst_r_lfe2en2
**SBS**
**F**. *SBS data by NUTS 2 regions and NACE Rev. 2 (from 2008 onwards)*
https://ec.europa.eu/eurostat/databrowser/product/view/sbs_r_nuts06_r2
**G**. *Enterprises by detailed NACE Rev.2 activity and special aggregates (from 2021)*
https://ec.europa.eu/eurostat/databrowser/product/view/sbs_ovw_act
**H**. *SBS historical data 2011-2020 - Country, 4-digits*
Annual detailed enterprise statistics for industry (NACE Rev. 2, B-E)
https://ec.europa.eu/eurostat/databrowser/product/view/sbs_na_ind_r2
Annual detailed enterprise statistics for construction (NACE Rev. 2, F)
https://ec.europa.eu/eurostat/databrowser/product/view/sbs_na_con_r2
Annual detailed enterprise statistics for services (NACE Rev. 2 H-N and S95)
https://ec.europa.eu/eurostat/databrowser/product/view/sbs_na_1a_se_r2
Annual detailed enterprise statistics for trade (NACE Rev. 2 G)
https://ec.europa.eu/eurostat/databrowser/product/view/sbs_na_dt_r2
| Ref.| Vars | NUTS | TIME| NACE | Notes
|---|---|---|---|---|---|
| A | e | 3 | 95/21 annual | 1r | 11 industries
| B | s | 2 | | 1r | |
| C | e | 2 | 95/21 annual | 1r ||
| D | e | 0 | 08/23 quarterly | 2 ||
| E | e | 2 | 08/22 annual | 1r | 10 industries
| F | e,u,s | 2 | 08/20 annual | 2 (3 G) | no Ag |
| G | e,u,s | 0 | 21 annual | 4 | no Ag |
| H | e,u,s | 0 | 05/20 annual | 4 | no Ag, split in parts |
>Vars: e=employment #; u=units/enterprises; s=salary/wages
Most up-to-date: D
Highest frequency: D
Most industry detail: G, H
Most space detail: A
```
^ NUTS
|
3 | A
|
2 | B,C,E F
|
1 |
|
0 | D G,H
| NACE
----------------------------------->
1r 2 3 4
```
We select D as main reference, and aim at expanding to G,H (for NACE) and to A (for NUTS).
We not consider further B,C,E, and possibly F, which in contrast could be used for validation.
Note that only F, G, H have info on units/enterprises and salaries.
```
^ NUTS
|
3 | REA(A)
|
2 |
|
1 |
|
0 | LFS(D) SBS(G,H)
| NACE
----------------------------------->
1r 2 3 4
REA: Regional Economic Accounts
SBS: Structural Business Statistics
LFS: Labour Force Survey
```
### Authors
Riccardo Boero - ribo@nilu.no
### Licenses
All the code and the resulting docker images in this project are subject to:
- [![AGPL v3](https://www.gnu.org/graphics/agplv3-88x31.png "License Icon")](https://www.gnu.org/licenses/agpl-3.0.en.html#license-text)
The input data is subject to:
- [Eurostat data](https://ec.europa.eu/eurostat/about-us/policies/copyright)
- [BLS Terms of Service](https://www.bls.gov/developers/termsOfService.htm)
- [Census Terms of Service](https://www.census.gov/data/developers/about/terms-of-service.html) This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.
#!/bin/bash
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
echo "*** Dump database:"
mkdir -p ./dump
mariadb-dump -h "$HOST" -u "$USER" -p"$PASSWORD" --databases FACT_jobs > ./dump/db_jobs.sql
echo "*** Drop database:"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" < inputs/sql/drop_db.sql
#!/bin/bash
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
for y in {2000..2023}
do
mkdir -p ./temp
wget -N "https://data.bls.gov/cew/data/files/${y}/csv/${y}_qtrly_singlefile.zip" -P ./temp
unzip -d ./temp/ ./temp/${y}_qtrly_singlefile.zip
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" -e "CREATE TEMPORARY TABLE FACT_jobs.staging (GeoID varchar(5) NOT NULL, Year INT NOT NULL, own_code varchar(1) NOT NULL, Naics varchar(6) NOT NULL, Agglvl_code varchar(2) NOT NULL, size_code varchar(1) NOT NULL, Qtr varchar(1) NOT NULL, disclosure_code varchar(1), Quarterly_establishments INT, Jobs_month1 INT, Jobs_month2 INT, Jobs_month3 INT, Avg_weekly_wage INT); LOAD DATA LOCAL INFILE \"temp/${y}.q1-q4.singlefile.csv\" INTO TABLE FACT_jobs.staging FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2, @col3, @col4, @col5, @col6, @col7, @col8, @col9, @col10, @col11, @col12, @col13, @col14, @col15, @col16, @col17, @col18, @col19, @col20, @col21, @col22, @col23, @col24, @col25, @col26, @col27, @col28, @col29, @col30, @col31, @col32, @col33, @col34, @col35, @col36, @col37, @col38, @col39, @col40, @col41, @col42) SET GeoID=@col1, Year=${y}, own_code=@col2, Naics=@col3, Agglvl_code=@col4, size_code=@col5, Qtr=@col7, disclosure_code=@col8, Quarterly_establishments=@col9, Jobs_month1=@col10, Jobs_month2=@col11, Jobs_month3=@col12, Avg_weekly_wage=@col16; CREATE TEMPORARY TABLE FACT_jobs.staging2 (GeoID varchar(5) NOT NULL, Year INT NOT NULL, Naics varchar(6) NOT NULL, Agglvl_code varchar(2) NOT NULL, Qtr varchar(1) NOT NULL, Disclosure_percent DOUBLE, Quarterly_establishments INT, Jobs_month1 INT, Jobs_month2 INT, Jobs_month3 INT, Avg_weekly_wage DOUBLE); INSERT INTO FACT_jobs.staging2 SELECT GeoID, Year, Naics, Agglvl_code, Qtr, (1.0 - (CASE WHEN SUM(Quarterly_establishments) = 0 THEN 0 ELSE (SUM(CASE WHEN disclosure_code = \"N\" THEN Quarterly_establishments ELSE 0 END) / SUM(Quarterly_establishments)) END )), SUM(Quarterly_establishments), SUM(Jobs_month1), SUM(Jobs_month2), SUM(Jobs_month3), (CASE WHEN sum(Jobs_month1 + Jobs_month2 + Jobs_month3) = 0 THEN 0 ELSE sum(Avg_weekly_wage * Jobs_month1 + Avg_weekly_wage * Jobs_month2 + Avg_weekly_wage * Jobs_month3) / sum(Jobs_month1 + Jobs_month2 + Jobs_month3) END) FROM FACT_jobs.staging WHERE ((Agglvl_code > 13 AND Agglvl_code < 19) OR (Agglvl_code > 53 AND Agglvl_code < 59) OR (Agglvl_code > 73 AND Agglvl_code < 79)) AND own_code > 0 AND size_code = 0 GROUP BY GeoID, Year, Qtr, Agglvl_code, Naics; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q1_establishments, Q1_disclosure, Q1_avg_weekly_wage, Jan_jobs, Feb_jobs, Mar_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 1 ON DUPLICATE KEY UPDATE Q1_establishments = Quarterly_establishments, Q1_disclosure = Disclosure_percent, Q1_avg_weekly_wage = Avg_weekly_wage, Jan_jobs = Jobs_month1, Feb_jobs = Jobs_month2, Mar_jobs = Jobs_month3; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q2_establishments, Q2_disclosure, Q2_avg_weekly_wage, Apr_jobs, May_jobs, Jun_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 2 ON DUPLICATE KEY UPDATE Q2_establishments = Quarterly_establishments, Q2_disclosure = Disclosure_percent, Q2_avg_weekly_wage = Avg_weekly_wage, Apr_jobs = Jobs_month1, May_jobs = Jobs_month2, Jun_jobs = Jobs_month3; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q3_establishments, Q3_disclosure, Q3_avg_weekly_wage, Jul_jobs, Aug_jobs, Sep_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 3 ON DUPLICATE KEY UPDATE Q3_establishments = Quarterly_establishments, Q3_disclosure = Disclosure_percent, Q3_avg_weekly_wage = Avg_weekly_wage, Jul_jobs = Jobs_month1, Aug_jobs = Jobs_month2, Sep_jobs = Jobs_month3; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q4_establishments, Q4_disclosure, Q4_avg_weekly_wage, Oct_jobs, Nov_jobs, Dec_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 4 ON DUPLICATE KEY UPDATE Q4_establishments = Quarterly_establishments, Q4_disclosure = Disclosure_percent, Q4_avg_weekly_wage = Avg_weekly_wage, Oct_jobs = Jobs_month1, Nov_jobs = Jobs_month2, Dec_jobs = Jobs_month3;"
rm -r ./temp
done
#!/bin/bash
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
for y in {2008..2023}
do
mkdir -p ./temp
for q in {1..4}
do
wget -O ./temp/temp_q${q}.tsv -N "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/lfsq_egan22d/1.0?c[sex]=T&c[age]=Y_GE15&c[time_period]=${y}-Q${q}&format=tsv"
done
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" -e "CREATE TEMPORARY TABLE FACT_jobs.staging_1 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q1.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_2 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q2.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_3 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q3.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_4 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q4.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q1) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_1 ON DUPLICATE KEY UPDATE EmpTh_Q1 = Value; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q2) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_2 ON DUPLICATE KEY UPDATE EmpTh_Q2 = Value; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q3) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_3 ON DUPLICATE KEY UPDATE EmpTh_Q3 = Value; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q4) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_4 ON DUPLICATE KEY UPDATE EmpTh_Q4 = Value;"
rm -r ./temp
done
#!/bin/bash
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
for y in {1995..2022}
do
mkdir -p ./temp
wget -O ./temp/temp.tsv -N "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/nama_10r_3empers/1.0?c[wstatus]=EMP&c[time_period]=${y}&format=tsv"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" -e "CREATE TEMPORARY TABLE FACT_jobs.staging (Area varchar(30) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp.tsv\" INTO TABLE FACT_jobs.staging FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; INSERT INTO FACT_jobs.REA (GeoID, Year, Nace, EmpTh) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging ON DUPLICATE KEY UPDATE EmpTh = Value;"
rm -r ./temp
done
#!/bin/bash
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
for y in {2021..2022}
do
mkdir -p ./temp
wget -O ./temp/temp_ENT.tsv -N "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/sbs_ovw_act/1.0?c[indic_sbs]=ENT_NR&c[time_period]=${y}&format=tsv"
wget -O ./temp/temp_EMP.tsv -N "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/sbs_ovw_act/1.0?c[indic_sbs]=EMP_NR&c[time_period]=${y}&format=tsv"
wget -O ./temp/temp_LC.tsv -N "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/sbs_ovw_act/1.0?c[indic_sbs]=LC_EMP_TEUR&c[time_period]=${y}&format=tsv"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" -e "CREATE TEMPORARY TABLE FACT_jobs.staging_ENT (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_ENT.tsv\" INTO TABLE FACT_jobs.staging_ENT FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_EMP (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_EMP.tsv\" INTO TABLE FACT_jobs.staging_EMP FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_LC (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_LC.tsv\" INTO TABLE FACT_jobs.staging_LC FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; INSERT INTO FACT_jobs.SBS (GeoID, Year, Nace, Enterprises) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-3),',',1), Value FROM FACT_jobs.staging_ENT ON DUPLICATE KEY UPDATE Enterprises = Value; INSERT INTO FACT_jobs.SBS (GeoID, Year, Nace, Employment) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-3),',',1), Value FROM FACT_jobs.staging_EMP ON DUPLICATE KEY UPDATE Employment = Value; INSERT INTO FACT_jobs.SBS (GeoID, Year, Nace, LaborCost) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-3),',',1), Value FROM FACT_jobs.staging_LC ON DUPLICATE KEY UPDATE LaborCost = Value;"
rm -r ./temp
done
#!/bin/bash
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
declare -a state_label=("ak" "al" "ar" "az" "ca" "co" "ct" "dc" "de" "fl" "ga" "hi" "ia" "id" "il" "in" "ks" "ky" "la" "ma" "md" "me" "mi" "mn" "mo" "ms" "mt" "nc" "nd" "ne" "nh" "nj" "nm" "nv" "ny" "oh" "ok" "or" "pa" "pr" "sc" "sd" "tn" "tx" "ut" "va" "vt" "wa" "wi" "wv" "wy")
for y in {2002..2021}
do
for i in "${state_label[@]}"
do
mkdir -p ./temp
wget -N "https://lehd.ces.census.gov/data/lodes/LODES8/${i}/wac/${i}_wac_S000_JT00_${y}.csv.gz" -P ./temp
FILE=./temp/${i}_wac_S000_JT00_${y}.csv
if [ -f "${FILE}.gz" ]; then
gunzip $FILE.gz
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" -e "LOAD DATA LOCAL INFILE '$FILE' INTO TABLE FACT_jobs.LODES8 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2, @col3, @col4, @col5, @col6, @col7, @col8, @col9, @col10, @col11, @col12, @col13, @col14, @col15, @col16, @col17, @col18, @col19, @col20, @col21, @col22, @col23, @col24, @col25, @col26, @col27, @col28, @col29, @col30, @col31, @col32, @col33, @col34, @col35, @col36, @col37, @col38, @col39, @col40, @col41, @col42, @col43, @col44, @col45, @col46, @col47, @col48, @col49, @col50, @col51, @col52, @col53) set GeoID=@col1,Year=${y},CNS01=@col9, CNS02=@col10, CNS03=@col11, CNS04=@col12, CNS05=@col13, CNS06=@col14, CNS07=@col15, CNS08=@col16, CNS09=@col17, CNS10=@col18, CNS11=@col19, CNS12=@col20, CNS13=@col21, CNS14=@col22, CNS15=@col23, CNS16=@col24, CNS17=@col25, CNS18=@col26, CNS19=@col27, CNS20=@col28;"
fi
rm -r ./temp
done
done
#!/bin/bash
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
echo "*** Create database on host: $HOST with user: $USER"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" < inputs/sql/create_db.sql
echo "*** Create tables on host: $HOST with user: $USER"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" < inputs/sql/create_tables.sql
CREATE DATABASE IF NOT EXISTS FACT_jobs;
CREATE TABLE FACT_jobs.LODES8 (
CounterID INT auto_increment NOT NULL,
GeoID varchar(15) NOT NULL,
Year INT NOT NULL,
CNS01 INT NOT NULL,
CNS02 INT NOT NULL,
CNS03 INT NOT NULL,
CNS04 INT NOT NULL,
CNS05 INT NOT NULL,
CNS06 INT NOT NULL,
CNS07 INT NOT NULL,
CNS08 INT NOT NULL,
CNS09 INT NOT NULL,
CNS10 INT NOT NULL,
CNS11 INT NOT NULL,
CNS12 INT NOT NULL,
CNS13 INT NOT NULL,
CNS14 INT NOT NULL,
CNS15 INT NOT NULL,
CNS16 INT NOT NULL,
CNS17 INT NOT NULL,
CNS18 INT NOT NULL,
CNS19 INT NOT NULL,
CNS20 INT NOT NULL,
PRIMARY KEY (CounterID)
)
ENGINE=ARIA
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_general_ci;
CREATE TABLE FACT_jobs.QCEW (
GeoID varchar(5) NOT NULL,
Year INT NOT NULL,
Naics varchar(6) NOT NULL,
Agglvl_code varchar(2) NOT NULL,
Q1_establishments INT,
Q1_disclosure DOUBLE,
Q1_avg_weekly_wage DOUBLE,
Jan_jobs INT,
Feb_jobs INT,
Mar_jobs INT,
Q2_establishments INT,
Q2_disclosure DOUBLE,
Q2_avg_weekly_wage DOUBLE,
Apr_jobs INT,
May_jobs INT,
Jun_jobs INT,
Q3_establishments INT,
Q3_disclosure DOUBLE,
Q3_avg_weekly_wage DOUBLE,
Jul_jobs INT,
Aug_jobs INT,
Sep_jobs INT,
Q4_establishments INT,
Q4_disclosure DOUBLE,
Q4_avg_weekly_wage DOUBLE,
Oct_jobs INT,
Nov_jobs INT,
Dec_jobs INT,
PRIMARY KEY (GeoID,Year,Naics)
)
ENGINE=ARIA
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_general_ci;
CREATE TABLE FACT_jobs.REA (
GeoID varchar(10) NOT NULL,
Year INT NOT NULL,
Nace varchar(6) NOT NULL,
EmpTh DOUBLE,
PRIMARY KEY (GeoID,Year,Nace)
)
ENGINE=ARIA
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_general_ci;
CREATE TABLE FACT_jobs.LFS (
GeoID varchar(10) NOT NULL,
Year INT NOT NULL,
Nace varchar(3) NOT NULL,
EmpTh_Q1 DOUBLE,
EmpTh_Q2 DOUBLE,
EmpTh_Q3 DOUBLE,
EmpTh_Q4 DOUBLE,
PRIMARY KEY (GeoID,Year,Nace)
)
ENGINE=ARIA
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_general_ci;
CREATE TABLE FACT_jobs.SBS (
GeoID varchar(2) NOT NULL,
Year INT NOT NULL,
Nace varchar(12) NOT NULL,
Enterprises INT,
Employment INT,
LaborCost DOUBLE,
PRIMARY KEY (GeoID,Year,Nace)
)
ENGINE=ARIA
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_general_ci;
DROP DATABASE IF EXISTS FACT_jobs;
#!/bin/bash
echo "*** Updating system:"
apt update -y && apt upgrade -y
#!/bin/bash
# Function to display help
function show_help() {
echo "Usage: $0 <host> <user> <password>"
echo ""
echo "This script sets up the database and loads the data."
echo ""
echo "Arguments:"
echo " host Database host"
echo " user Database user"
echo " password Database password"
}
# Check if help is requested
if [[ "$1" == "-h" || "$1" == "--help" ]]; then
show_help
exit 0
fi
# Check if all three parameters are provided
if [[ $# -ne 3 ]]; then
echo "Error: Missing arguments."
show_help
exit 1
fi
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
# Run the individual scripts, passing host, user, and password
# create database
./inputs/setup_db.sh "$HOST" "$USER" "$PASSWORD"
# load LODES data into DB
./inputs/load_lodes.sh "$HOST" "$USER" "$PASSWORD"
# load BLS data into DB
./inputs/load_bls.sh "$HOST" "$USER" "$PASSWORD"
# load EUROSTAT data into DB
./inputs/load_eurostat_rea.sh "$HOST" "$USER" "$PASSWORD"
./inputs/load_eurostat_lfs.sh "$HOST" "$USER" "$PASSWORD"
./inputs/load_eurostat_sbs.sh "$HOST" "$USER" "$PASSWORD"
# dump data and drop database on 'local' instance
./inputs/dump_and_drop.sh "$HOST" "$USER" "$PASSWORD"